claudient 0.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.claude-plugin/plugin.json +42 -0
- package/CONTEXT.md +58 -0
- package/README.md +165 -0
- package/agents/build-resolvers/de/python-resolver.md +64 -0
- package/agents/build-resolvers/de/typescript-resolver.md +65 -0
- package/agents/build-resolvers/es/python-resolver.md +64 -0
- package/agents/build-resolvers/es/typescript-resolver.md +65 -0
- package/agents/build-resolvers/fr/python-resolver.md +64 -0
- package/agents/build-resolvers/fr/typescript-resolver.md +65 -0
- package/agents/build-resolvers/nl/python-resolver.md +64 -0
- package/agents/build-resolvers/nl/typescript-resolver.md +65 -0
- package/agents/build-resolvers/python-resolver.md +62 -0
- package/agents/build-resolvers/typescript-resolver.md +63 -0
- package/agents/core/architect.md +64 -0
- package/agents/core/code-reviewer.md +78 -0
- package/agents/core/de/architect.md +66 -0
- package/agents/core/de/code-reviewer.md +80 -0
- package/agents/core/de/planner.md +63 -0
- package/agents/core/de/security-reviewer.md +93 -0
- package/agents/core/es/architect.md +66 -0
- package/agents/core/es/code-reviewer.md +80 -0
- package/agents/core/es/planner.md +63 -0
- package/agents/core/es/security-reviewer.md +93 -0
- package/agents/core/fr/architect.md +66 -0
- package/agents/core/fr/code-reviewer.md +80 -0
- package/agents/core/fr/planner.md +63 -0
- package/agents/core/fr/security-reviewer.md +93 -0
- package/agents/core/nl/architect.md +66 -0
- package/agents/core/nl/code-reviewer.md +80 -0
- package/agents/core/nl/planner.md +63 -0
- package/agents/core/nl/security-reviewer.md +93 -0
- package/agents/core/planner.md +61 -0
- package/agents/core/security-reviewer.md +91 -0
- package/guides/agent-orchestration.md +231 -0
- package/guides/de/agent-orchestration.md +174 -0
- package/guides/de/getting-started.md +164 -0
- package/guides/de/hooks-cookbook.md +160 -0
- package/guides/de/memory-management.md +153 -0
- package/guides/de/security.md +180 -0
- package/guides/de/skill-authoring.md +214 -0
- package/guides/de/token-optimization.md +156 -0
- package/guides/es/agent-orchestration.md +174 -0
- package/guides/es/getting-started.md +164 -0
- package/guides/es/hooks-cookbook.md +160 -0
- package/guides/es/memory-management.md +153 -0
- package/guides/es/security.md +180 -0
- package/guides/es/skill-authoring.md +214 -0
- package/guides/es/token-optimization.md +156 -0
- package/guides/fr/agent-orchestration.md +174 -0
- package/guides/fr/getting-started.md +164 -0
- package/guides/fr/hooks-cookbook.md +227 -0
- package/guides/fr/memory-management.md +169 -0
- package/guides/fr/security.md +180 -0
- package/guides/fr/skill-authoring.md +214 -0
- package/guides/fr/token-optimization.md +158 -0
- package/guides/getting-started.md +164 -0
- package/guides/hooks-cookbook.md +423 -0
- package/guides/memory-management.md +192 -0
- package/guides/nl/agent-orchestration.md +174 -0
- package/guides/nl/getting-started.md +164 -0
- package/guides/nl/hooks-cookbook.md +160 -0
- package/guides/nl/memory-management.md +153 -0
- package/guides/nl/security.md +180 -0
- package/guides/nl/skill-authoring.md +214 -0
- package/guides/nl/token-optimization.md +156 -0
- package/guides/security.md +229 -0
- package/guides/skill-authoring.md +226 -0
- package/guides/token-optimization.md +169 -0
- package/hooks/lifecycle/cost-tracker.md +49 -0
- package/hooks/lifecycle/cost-tracker.sh +59 -0
- package/hooks/lifecycle/pre-compact-save.md +56 -0
- package/hooks/lifecycle/pre-compact-save.sh +37 -0
- package/hooks/lifecycle/session-start.md +50 -0
- package/hooks/lifecycle/session-start.sh +47 -0
- package/hooks/post-tool-use/audit-log.md +53 -0
- package/hooks/post-tool-use/audit-log.sh +53 -0
- package/hooks/post-tool-use/prettier.md +53 -0
- package/hooks/post-tool-use/prettier.sh +49 -0
- package/hooks/pre-tool-use/block-dangerous.md +48 -0
- package/hooks/pre-tool-use/block-dangerous.sh +76 -0
- package/hooks/pre-tool-use/git-push-confirm.md +46 -0
- package/hooks/pre-tool-use/git-push-confirm.sh +36 -0
- package/mcp/configs/github.json +11 -0
- package/mcp/configs/postgres.json +11 -0
- package/mcp/de/recommended-servers.md +170 -0
- package/mcp/es/recommended-servers.md +170 -0
- package/mcp/fr/recommended-servers.md +170 -0
- package/mcp/nl/recommended-servers.md +170 -0
- package/mcp/recommended-servers.md +168 -0
- package/package.json +45 -0
- package/prompts/project-starters/de/fastapi-project.md +62 -0
- package/prompts/project-starters/de/nextjs-project.md +82 -0
- package/prompts/project-starters/es/fastapi-project.md +62 -0
- package/prompts/project-starters/es/nextjs-project.md +82 -0
- package/prompts/project-starters/fastapi-project.md +60 -0
- package/prompts/project-starters/fr/fastapi-project.md +62 -0
- package/prompts/project-starters/fr/nextjs-project.md +82 -0
- package/prompts/project-starters/nextjs-project.md +80 -0
- package/prompts/project-starters/nl/fastapi-project.md +62 -0
- package/prompts/project-starters/nl/nextjs-project.md +82 -0
- package/prompts/system-prompts/ai-product.md +80 -0
- package/prompts/system-prompts/data-pipeline.md +76 -0
- package/prompts/system-prompts/de/ai-product.md +82 -0
- package/prompts/system-prompts/de/data-pipeline.md +78 -0
- package/prompts/system-prompts/de/saas-backend.md +71 -0
- package/prompts/system-prompts/es/ai-product.md +82 -0
- package/prompts/system-prompts/es/data-pipeline.md +78 -0
- package/prompts/system-prompts/es/saas-backend.md +71 -0
- package/prompts/system-prompts/fr/ai-product.md +82 -0
- package/prompts/system-prompts/fr/data-pipeline.md +78 -0
- package/prompts/system-prompts/fr/saas-backend.md +71 -0
- package/prompts/system-prompts/nl/ai-product.md +82 -0
- package/prompts/system-prompts/nl/data-pipeline.md +78 -0
- package/prompts/system-prompts/nl/saas-backend.md +71 -0
- package/prompts/system-prompts/saas-backend.md +69 -0
- package/prompts/task-specific/changelog.md +81 -0
- package/prompts/task-specific/de/changelog.md +83 -0
- package/prompts/task-specific/de/debugging.md +78 -0
- package/prompts/task-specific/de/pr-description.md +69 -0
- package/prompts/task-specific/debugging.md +76 -0
- package/prompts/task-specific/es/changelog.md +83 -0
- package/prompts/task-specific/es/debugging.md +78 -0
- package/prompts/task-specific/es/pr-description.md +69 -0
- package/prompts/task-specific/fr/changelog.md +83 -0
- package/prompts/task-specific/fr/debugging.md +78 -0
- package/prompts/task-specific/fr/pr-description.md +69 -0
- package/prompts/task-specific/nl/changelog.md +83 -0
- package/prompts/task-specific/nl/debugging.md +78 -0
- package/prompts/task-specific/nl/pr-description.md +69 -0
- package/prompts/task-specific/pr-description.md +67 -0
- package/rules/common/coding-style.md +45 -0
- package/rules/common/de/coding-style.md +47 -0
- package/rules/common/de/git.md +48 -0
- package/rules/common/de/performance.md +40 -0
- package/rules/common/de/security.md +45 -0
- package/rules/common/de/testing.md +45 -0
- package/rules/common/es/coding-style.md +47 -0
- package/rules/common/es/git.md +48 -0
- package/rules/common/es/performance.md +40 -0
- package/rules/common/es/security.md +45 -0
- package/rules/common/es/testing.md +45 -0
- package/rules/common/fr/coding-style.md +47 -0
- package/rules/common/fr/git.md +48 -0
- package/rules/common/fr/performance.md +40 -0
- package/rules/common/fr/security.md +45 -0
- package/rules/common/fr/testing.md +45 -0
- package/rules/common/git.md +46 -0
- package/rules/common/nl/coding-style.md +47 -0
- package/rules/common/nl/git.md +48 -0
- package/rules/common/nl/performance.md +40 -0
- package/rules/common/nl/security.md +45 -0
- package/rules/common/nl/testing.md +45 -0
- package/rules/common/performance.md +38 -0
- package/rules/common/security.md +43 -0
- package/rules/common/testing.md +43 -0
- package/rules/language-specific/de/go.md +48 -0
- package/rules/language-specific/de/python.md +38 -0
- package/rules/language-specific/de/typescript.md +51 -0
- package/rules/language-specific/es/go.md +48 -0
- package/rules/language-specific/es/python.md +38 -0
- package/rules/language-specific/es/typescript.md +51 -0
- package/rules/language-specific/fr/go.md +48 -0
- package/rules/language-specific/fr/python.md +38 -0
- package/rules/language-specific/fr/typescript.md +51 -0
- package/rules/language-specific/go.md +46 -0
- package/rules/language-specific/nl/go.md +48 -0
- package/rules/language-specific/nl/python.md +38 -0
- package/rules/language-specific/nl/typescript.md +51 -0
- package/rules/language-specific/python.md +36 -0
- package/rules/language-specific/typescript.md +49 -0
- package/scripts/cli.js +161 -0
- package/scripts/link-skills.sh +35 -0
- package/scripts/list-skills.sh +34 -0
- package/skills/ai-engineering/agent-construction.md +285 -0
- package/skills/ai-engineering/claude-api.md +248 -0
- package/skills/ai-engineering/de/agent-construction.md +287 -0
- package/skills/ai-engineering/de/claude-api.md +250 -0
- package/skills/ai-engineering/es/agent-construction.md +287 -0
- package/skills/ai-engineering/es/claude-api.md +250 -0
- package/skills/ai-engineering/fr/agent-construction.md +287 -0
- package/skills/ai-engineering/fr/claude-api.md +250 -0
- package/skills/ai-engineering/nl/agent-construction.md +287 -0
- package/skills/ai-engineering/nl/claude-api.md +250 -0
- package/skills/backend/dotnet/csharp.md +304 -0
- package/skills/backend/dotnet/de/csharp.md +306 -0
- package/skills/backend/dotnet/es/csharp.md +306 -0
- package/skills/backend/dotnet/fr/csharp.md +306 -0
- package/skills/backend/dotnet/nl/csharp.md +306 -0
- package/skills/backend/go/de/go.md +307 -0
- package/skills/backend/go/es/go.md +307 -0
- package/skills/backend/go/fr/go.md +307 -0
- package/skills/backend/go/go.md +305 -0
- package/skills/backend/go/nl/go.md +307 -0
- package/skills/backend/nodejs/de/nestjs.md +274 -0
- package/skills/backend/nodejs/de/nextjs.md +222 -0
- package/skills/backend/nodejs/es/nestjs.md +274 -0
- package/skills/backend/nodejs/es/nextjs.md +222 -0
- package/skills/backend/nodejs/fr/nestjs.md +274 -0
- package/skills/backend/nodejs/fr/nextjs.md +222 -0
- package/skills/backend/nodejs/nestjs.md +272 -0
- package/skills/backend/nodejs/nextjs.md +220 -0
- package/skills/backend/nodejs/nl/nestjs.md +274 -0
- package/skills/backend/nodejs/nl/nextjs.md +222 -0
- package/skills/backend/python/de/django.md +285 -0
- package/skills/backend/python/de/fastapi.md +244 -0
- package/skills/backend/python/django.md +283 -0
- package/skills/backend/python/es/django.md +285 -0
- package/skills/backend/python/es/fastapi.md +244 -0
- package/skills/backend/python/fastapi.md +242 -0
- package/skills/backend/python/fr/django.md +285 -0
- package/skills/backend/python/fr/fastapi.md +244 -0
- package/skills/backend/python/nl/django.md +285 -0
- package/skills/backend/python/nl/fastapi.md +244 -0
- package/skills/data-ml/dbt-data-pipelines.md +155 -0
- package/skills/data-ml/de/dbt-data-pipelines.md +157 -0
- package/skills/data-ml/de/pandas-polars.md +147 -0
- package/skills/data-ml/de/pytorch-tensorflow.md +171 -0
- package/skills/data-ml/es/dbt-data-pipelines.md +157 -0
- package/skills/data-ml/es/pandas-polars.md +147 -0
- package/skills/data-ml/es/pytorch-tensorflow.md +171 -0
- package/skills/data-ml/fr/dbt-data-pipelines.md +157 -0
- package/skills/data-ml/fr/pandas-polars.md +147 -0
- package/skills/data-ml/fr/pytorch-tensorflow.md +171 -0
- package/skills/data-ml/nl/dbt-data-pipelines.md +157 -0
- package/skills/data-ml/nl/pandas-polars.md +147 -0
- package/skills/data-ml/nl/pytorch-tensorflow.md +171 -0
- package/skills/data-ml/pandas-polars.md +145 -0
- package/skills/data-ml/pytorch-tensorflow.md +169 -0
- package/skills/database/de/graphql.md +181 -0
- package/skills/database/es/graphql.md +181 -0
- package/skills/database/fr/graphql.md +181 -0
- package/skills/database/graphql.md +179 -0
- package/skills/database/nl/graphql.md +181 -0
- package/skills/devops-infra/de/docker.md +133 -0
- package/skills/devops-infra/de/github-actions.md +179 -0
- package/skills/devops-infra/de/kubernetes.md +129 -0
- package/skills/devops-infra/de/terraform.md +130 -0
- package/skills/devops-infra/docker.md +131 -0
- package/skills/devops-infra/es/docker.md +133 -0
- package/skills/devops-infra/es/github-actions.md +179 -0
- package/skills/devops-infra/es/kubernetes.md +129 -0
- package/skills/devops-infra/es/terraform.md +130 -0
- package/skills/devops-infra/fr/docker.md +133 -0
- package/skills/devops-infra/fr/github-actions.md +179 -0
- package/skills/devops-infra/fr/kubernetes.md +129 -0
- package/skills/devops-infra/fr/terraform.md +130 -0
- package/skills/devops-infra/github-actions.md +177 -0
- package/skills/devops-infra/kubernetes.md +127 -0
- package/skills/devops-infra/nl/docker.md +133 -0
- package/skills/devops-infra/nl/github-actions.md +179 -0
- package/skills/devops-infra/nl/kubernetes.md +129 -0
- package/skills/devops-infra/nl/terraform.md +130 -0
- package/skills/devops-infra/terraform.md +128 -0
- package/skills/finance-payments/de/stripe.md +187 -0
- package/skills/finance-payments/es/stripe.md +187 -0
- package/skills/finance-payments/fr/stripe.md +187 -0
- package/skills/finance-payments/nl/stripe.md +187 -0
- package/skills/finance-payments/stripe.md +185 -0
- package/workflows/code-review.md +151 -0
- package/workflows/de/code-review.md +153 -0
- package/workflows/de/debugging-session.md +146 -0
- package/workflows/de/feature-development.md +155 -0
- package/workflows/de/new-project-bootstrap.md +175 -0
- package/workflows/de/refactor-safely.md +150 -0
- package/workflows/debugging-session.md +144 -0
- package/workflows/es/code-review.md +153 -0
- package/workflows/es/debugging-session.md +146 -0
- package/workflows/es/feature-development.md +155 -0
- package/workflows/es/new-project-bootstrap.md +175 -0
- package/workflows/es/refactor-safely.md +150 -0
- package/workflows/feature-development.md +153 -0
- package/workflows/fr/code-review.md +153 -0
- package/workflows/fr/debugging-session.md +146 -0
- package/workflows/fr/feature-development.md +155 -0
- package/workflows/fr/new-project-bootstrap.md +175 -0
- package/workflows/fr/refactor-safely.md +150 -0
- package/workflows/new-project-bootstrap.md +173 -0
- package/workflows/nl/code-review.md +153 -0
- package/workflows/nl/debugging-session.md +146 -0
- package/workflows/nl/feature-development.md +155 -0
- package/workflows/nl/new-project-bootstrap.md +175 -0
- package/workflows/nl/refactor-safely.md +150 -0
- package/workflows/refactor-safely.md +148 -0
|
@@ -0,0 +1,244 @@
|
|
|
1
|
+
> 🇳🇱 Dit is de Nederlandse vertaling. [Engelse versie](../fastapi.md).
|
|
2
|
+
|
|
3
|
+
# FastAPI Skill
|
|
4
|
+
|
|
5
|
+
## Wanneer te activeren
|
|
6
|
+
- Een Python REST- of async API bouwen met FastAPI
|
|
7
|
+
- Pydantic aanvraag/respons-modellen definiëren
|
|
8
|
+
- Dependency injection instellen met `Depends`
|
|
9
|
+
- Async route-handlers schrijven met SQLAlchemy of async DB-drivers
|
|
10
|
+
- Middleware toevoegen (CORS, auth, logging, rate limiting)
|
|
11
|
+
- Achtergrondtaken of Celery-workers configureren
|
|
12
|
+
- OpenAPI-documentatie aanpassen (tags, beschrijvingen, respons-schema's)
|
|
13
|
+
- Integratietests schrijven met `TestClient` of `AsyncClient`
|
|
14
|
+
- Een multi-module FastAPI-project structureren
|
|
15
|
+
|
|
16
|
+
## Wanneer NIET te gebruiken
|
|
17
|
+
- Django/DRF-projecten — gebruik de Django skill
|
|
18
|
+
- Alleen-synchrone codebases waarbij async overhead niet gerechtvaardigd is
|
|
19
|
+
- Eenvoudige scripts die geen HTTP nodig hebben — gebruik gewone Python
|
|
20
|
+
- gRPC of GraphQL API's — ander transport en schemalag
|
|
21
|
+
|
|
22
|
+
## Instructies
|
|
23
|
+
|
|
24
|
+
### Projectstructuur
|
|
25
|
+
```
|
|
26
|
+
app/
|
|
27
|
+
├── main.py # FastAPI app factory
|
|
28
|
+
├── core/
|
|
29
|
+
│ ├── config.py # Instellingen via pydantic-settings
|
|
30
|
+
│ └── security.py # JWT, hashing-hulpprogramma's
|
|
31
|
+
├── api/
|
|
32
|
+
│ ├── deps.py # Gedeelde Depends()-functies
|
|
33
|
+
│ └── v1/
|
|
34
|
+
│ ├── router.py # APIRouter-aggregator
|
|
35
|
+
│ └── endpoints/
|
|
36
|
+
│ ├── users.py
|
|
37
|
+
│ └── items.py
|
|
38
|
+
├── models/ # SQLAlchemy ORM-modellen
|
|
39
|
+
├── schemas/ # Pydantic aanvraag/respons-schema's
|
|
40
|
+
├── crud/ # DB-operatiefuncties (niet ORM, niet HTTP)
|
|
41
|
+
└── db/
|
|
42
|
+
├── session.py # AsyncSession-fabriek
|
|
43
|
+
└── base.py # Declaratieve basisimport
|
|
44
|
+
```
|
|
45
|
+
|
|
46
|
+
### App factory
|
|
47
|
+
```python
|
|
48
|
+
# main.py
|
|
49
|
+
from fastapi import FastAPI
|
|
50
|
+
from app.api.v1.router import api_router
|
|
51
|
+
from app.core.config import settings
|
|
52
|
+
|
|
53
|
+
def create_app() -> FastAPI:
|
|
54
|
+
app = FastAPI(
|
|
55
|
+
title=settings.PROJECT_NAME,
|
|
56
|
+
openapi_url=f"{settings.API_V1_STR}/openapi.json",
|
|
57
|
+
docs_url="/docs" if settings.ENVIRONMENT != "production" else None,
|
|
58
|
+
)
|
|
59
|
+
app.include_router(api_router, prefix=settings.API_V1_STR)
|
|
60
|
+
return app
|
|
61
|
+
|
|
62
|
+
app = create_app()
|
|
63
|
+
```
|
|
64
|
+
|
|
65
|
+
### Instellingen met pydantic-settings
|
|
66
|
+
```python
|
|
67
|
+
# core/config.py
|
|
68
|
+
from pydantic_settings import BaseSettings
|
|
69
|
+
|
|
70
|
+
class Settings(BaseSettings):
|
|
71
|
+
PROJECT_NAME: str = "MyAPI"
|
|
72
|
+
API_V1_STR: str = "/api/v1"
|
|
73
|
+
DATABASE_URL: str
|
|
74
|
+
SECRET_KEY: str
|
|
75
|
+
ENVIRONMENT: str = "development"
|
|
76
|
+
ACCESS_TOKEN_EXPIRE_MINUTES: int = 30
|
|
77
|
+
|
|
78
|
+
class Config:
|
|
79
|
+
env_file = ".env"
|
|
80
|
+
case_sensitive = True
|
|
81
|
+
|
|
82
|
+
settings = Settings()
|
|
83
|
+
```
|
|
84
|
+
|
|
85
|
+
### Async SQLAlchemy-sessieafhankelijkheid
|
|
86
|
+
```python
|
|
87
|
+
# db/session.py
|
|
88
|
+
from sqlalchemy.ext.asyncio import AsyncSession, create_async_engine, async_sessionmaker
|
|
89
|
+
|
|
90
|
+
engine = create_async_engine(settings.DATABASE_URL, echo=False)
|
|
91
|
+
AsyncSessionLocal = async_sessionmaker(engine, expire_on_commit=False)
|
|
92
|
+
|
|
93
|
+
# api/deps.py
|
|
94
|
+
async def get_db() -> AsyncIterator[AsyncSession]:
|
|
95
|
+
async with AsyncSessionLocal() as session:
|
|
96
|
+
try:
|
|
97
|
+
yield session
|
|
98
|
+
await session.commit()
|
|
99
|
+
except Exception:
|
|
100
|
+
await session.rollback()
|
|
101
|
+
raise
|
|
102
|
+
```
|
|
103
|
+
|
|
104
|
+
### Route-handlers
|
|
105
|
+
```python
|
|
106
|
+
# api/v1/endpoints/users.py
|
|
107
|
+
from fastapi import APIRouter, Depends, HTTPException, status
|
|
108
|
+
from sqlalchemy.ext.asyncio import AsyncSession
|
|
109
|
+
from app.api.deps import get_db, get_current_user
|
|
110
|
+
from app.crud import user as crud_user
|
|
111
|
+
from app.schemas.user import UserCreate, UserResponse
|
|
112
|
+
|
|
113
|
+
router = APIRouter(prefix="/users", tags=["users"])
|
|
114
|
+
|
|
115
|
+
@router.post("/", response_model=UserResponse, status_code=status.HTTP_201_CREATED)
|
|
116
|
+
async def create_user(
|
|
117
|
+
payload: UserCreate,
|
|
118
|
+
db: AsyncSession = Depends(get_db),
|
|
119
|
+
) -> UserResponse:
|
|
120
|
+
if await crud_user.get_by_email(db, email=payload.email):
|
|
121
|
+
raise HTTPException(status_code=400, detail="Email already registered")
|
|
122
|
+
return await crud_user.create(db, obj_in=payload)
|
|
123
|
+
```
|
|
124
|
+
|
|
125
|
+
### Dependency injection voor auth
|
|
126
|
+
```python
|
|
127
|
+
# api/deps.py
|
|
128
|
+
from fastapi import Depends, HTTPException, status
|
|
129
|
+
from fastapi.security import OAuth2PasswordBearer
|
|
130
|
+
from jose import JWTError, jwt
|
|
131
|
+
|
|
132
|
+
oauth2_scheme = OAuth2PasswordBearer(tokenUrl="/api/v1/auth/token")
|
|
133
|
+
|
|
134
|
+
async def get_current_user(
|
|
135
|
+
token: str = Depends(oauth2_scheme),
|
|
136
|
+
db: AsyncSession = Depends(get_db),
|
|
137
|
+
) -> User:
|
|
138
|
+
try:
|
|
139
|
+
payload = jwt.decode(token, settings.SECRET_KEY, algorithms=["HS256"])
|
|
140
|
+
user_id: str = payload.get("sub")
|
|
141
|
+
if user_id is None:
|
|
142
|
+
raise credentials_exception
|
|
143
|
+
except JWTError:
|
|
144
|
+
raise HTTPException(status_code=401, detail="Invalid credentials")
|
|
145
|
+
user = await crud_user.get(db, id=user_id)
|
|
146
|
+
if user is None:
|
|
147
|
+
raise HTTPException(status_code=404, detail="User not found")
|
|
148
|
+
return user
|
|
149
|
+
```
|
|
150
|
+
|
|
151
|
+
### Achtergrondtaken
|
|
152
|
+
```python
|
|
153
|
+
# Gebruik FastAPI's BackgroundTasks voor lichtgewicht fire-and-forget (geen resultaat nodig)
|
|
154
|
+
@router.post("/send-email")
|
|
155
|
+
async def send_email_endpoint(
|
|
156
|
+
payload: EmailPayload,
|
|
157
|
+
background_tasks: BackgroundTasks,
|
|
158
|
+
):
|
|
159
|
+
background_tasks.add_task(send_email, payload.to, payload.subject, payload.body)
|
|
160
|
+
return {"status": "queued"}
|
|
161
|
+
|
|
162
|
+
# Gebruik Celery voor: herhaalpogingen, resultaattracking, planning, cross-service taken
|
|
163
|
+
```
|
|
164
|
+
|
|
165
|
+
### CORS-middleware
|
|
166
|
+
```python
|
|
167
|
+
from fastapi.middleware.cors import CORSMiddleware
|
|
168
|
+
|
|
169
|
+
app.add_middleware(
|
|
170
|
+
CORSMiddleware,
|
|
171
|
+
allow_origins=settings.ALLOWED_ORIGINS, # Nooit ["*"] in productie
|
|
172
|
+
allow_credentials=True,
|
|
173
|
+
allow_methods=["*"],
|
|
174
|
+
allow_headers=["*"],
|
|
175
|
+
)
|
|
176
|
+
```
|
|
177
|
+
|
|
178
|
+
### Aangepaste uitzonderingshandlers
|
|
179
|
+
```python
|
|
180
|
+
from fastapi.responses import JSONResponse
|
|
181
|
+
|
|
182
|
+
@app.exception_handler(ValueError)
|
|
183
|
+
async def value_error_handler(request: Request, exc: ValueError) -> JSONResponse:
|
|
184
|
+
return JSONResponse(status_code=422, content={"detail": str(exc)})
|
|
185
|
+
```
|
|
186
|
+
|
|
187
|
+
### Testen
|
|
188
|
+
```python
|
|
189
|
+
# tests/conftest.py
|
|
190
|
+
import pytest
|
|
191
|
+
from httpx import AsyncClient, ASGITransport
|
|
192
|
+
from app.main import app
|
|
193
|
+
|
|
194
|
+
@pytest.fixture
|
|
195
|
+
async def client() -> AsyncIterator[AsyncClient]:
|
|
196
|
+
async with AsyncClient(
|
|
197
|
+
transport=ASGITransport(app=app), base_url="http://test"
|
|
198
|
+
) as ac:
|
|
199
|
+
yield ac
|
|
200
|
+
|
|
201
|
+
# tests/test_users.py
|
|
202
|
+
@pytest.mark.asyncio
|
|
203
|
+
async def test_create_user(client: AsyncClient, db_session):
|
|
204
|
+
resp = await client.post("/api/v1/users/", json={"email": "a@b.com", "password": "secret"})
|
|
205
|
+
assert resp.status_code == 201
|
|
206
|
+
assert resp.json()["email"] == "a@b.com"
|
|
207
|
+
```
|
|
208
|
+
|
|
209
|
+
### Veelvoorkomende Pydantic-patronen
|
|
210
|
+
```python
|
|
211
|
+
from pydantic import BaseModel, EmailStr, field_validator, model_validator
|
|
212
|
+
|
|
213
|
+
class UserCreate(BaseModel):
|
|
214
|
+
email: EmailStr
|
|
215
|
+
password: str
|
|
216
|
+
|
|
217
|
+
@field_validator("password")
|
|
218
|
+
@classmethod
|
|
219
|
+
def password_strength(cls, v: str) -> str:
|
|
220
|
+
if len(v) < 8:
|
|
221
|
+
raise ValueError("Password must be at least 8 characters")
|
|
222
|
+
return v
|
|
223
|
+
|
|
224
|
+
class UserResponse(BaseModel):
|
|
225
|
+
id: int
|
|
226
|
+
email: EmailStr
|
|
227
|
+
|
|
228
|
+
model_config = {"from_attributes": True} # vervangt orm_mode = True
|
|
229
|
+
```
|
|
230
|
+
|
|
231
|
+
## Voorbeeld
|
|
232
|
+
|
|
233
|
+
**Gebruiker:** Bouw een FastAPI-endpoint om een blogbericht aan te maken, geauthenticeerd met JWT, opgeslagen in PostgreSQL met SQLAlchemy async.
|
|
234
|
+
|
|
235
|
+
**Verwachte structuur:**
|
|
236
|
+
- `schemas/post.py` — `PostCreate(BaseModel)`, `PostResponse(BaseModel)` met `from_attributes=True`
|
|
237
|
+
- `models/post.py` — `Post` ORM-model met `id`, `title`, `body`, `author_id` (FK naar User), `created_at`
|
|
238
|
+
- `crud/post.py` — `create(db, *, obj_in, author_id)` async-functie
|
|
239
|
+
- `api/v1/endpoints/posts.py` — `POST /posts/` met `Depends(get_current_user)` en `Depends(get_db)`
|
|
240
|
+
- `api/v1/router.py` — posts-router opnemen
|
|
241
|
+
|
|
242
|
+
---
|
|
243
|
+
|
|
244
|
+
> **Werk met ons:** Claudient wordt ondersteund door [Uitbreiden](https://uitbreiden.com/) — we bouwen AI-producten en B2B-oplossingen met ontwikkelaarsgemeenschappen. [uitbreiden.com](https://uitbreiden.com/)
|
|
@@ -0,0 +1,155 @@
|
|
|
1
|
+
# dbt Data Pipelines Skill
|
|
2
|
+
|
|
3
|
+
## When to activate
|
|
4
|
+
- Writing dbt models (staging, intermediate, mart layers)
|
|
5
|
+
- Configuring dbt sources, refs, and dependencies
|
|
6
|
+
- Writing dbt tests (schema tests, singular tests, custom generic tests)
|
|
7
|
+
- Setting up dbt project structure for a new data warehouse
|
|
8
|
+
- Writing dbt macros for reusable SQL logic
|
|
9
|
+
- Configuring dbt documentation and freshness checks
|
|
10
|
+
- Debugging dbt compilation errors or failed model runs
|
|
11
|
+
- Setting up dbt with BigQuery, Snowflake, Redshift, or DuckDB
|
|
12
|
+
|
|
13
|
+
## When NOT to use
|
|
14
|
+
- Raw ETL pipelines without a warehouse (use Airflow, Prefect, or Dagster instead)
|
|
15
|
+
- Real-time streaming data (dbt is batch-only)
|
|
16
|
+
- Pandas/Polars in-memory transformations (use the pandas-polars skill)
|
|
17
|
+
- Data ingestion (dbt transforms, it doesn't ingest)
|
|
18
|
+
|
|
19
|
+
## Instructions
|
|
20
|
+
|
|
21
|
+
### Project layer architecture
|
|
22
|
+
Always separate models into three layers:
|
|
23
|
+
```
|
|
24
|
+
models/
|
|
25
|
+
├── staging/ ← 1:1 with source tables. Light cleaning only. No joins.
|
|
26
|
+
│ ├── stg_orders.sql
|
|
27
|
+
│ └── stg_customers.sql
|
|
28
|
+
├── intermediate/ ← Business logic. Joins allowed. Not exposed to BI tools.
|
|
29
|
+
│ └── int_orders_with_customers.sql
|
|
30
|
+
└── marts/ ← Final business entities. Exposed to BI. Aggregations live here.
|
|
31
|
+
├── finance/
|
|
32
|
+
│ └── fct_revenue.sql
|
|
33
|
+
└── marketing/
|
|
34
|
+
└── dim_customers.sql
|
|
35
|
+
```
|
|
36
|
+
|
|
37
|
+
**Staging rules:**
|
|
38
|
+
- Rename columns to project conventions (snake_case)
|
|
39
|
+
- Cast types explicitly
|
|
40
|
+
- No business logic — no joins, no aggregations
|
|
41
|
+
- Prefix with `stg_`
|
|
42
|
+
|
|
43
|
+
**Mart rules:**
|
|
44
|
+
- `fct_` prefix for fact tables (events, transactions)
|
|
45
|
+
- `dim_` prefix for dimension tables (customers, products)
|
|
46
|
+
- Always document in schema.yml
|
|
47
|
+
|
|
48
|
+
### Model configuration
|
|
49
|
+
```sql
|
|
50
|
+
-- models/marts/finance/fct_revenue.sql
|
|
51
|
+
{{
|
|
52
|
+
config(
|
|
53
|
+
materialized='incremental',
|
|
54
|
+
unique_key='order_id',
|
|
55
|
+
on_schema_change='fail'
|
|
56
|
+
)
|
|
57
|
+
}}
|
|
58
|
+
|
|
59
|
+
with orders as (
|
|
60
|
+
select * from {{ ref('int_orders_with_customers') }}
|
|
61
|
+
{% if is_incremental() %}
|
|
62
|
+
where created_at > (select max(created_at) from {{ this }})
|
|
63
|
+
{% endif %}
|
|
64
|
+
)
|
|
65
|
+
|
|
66
|
+
select
|
|
67
|
+
order_id,
|
|
68
|
+
customer_id,
|
|
69
|
+
amount,
|
|
70
|
+
created_at
|
|
71
|
+
from orders
|
|
72
|
+
```
|
|
73
|
+
|
|
74
|
+
**Materialization choices:**
|
|
75
|
+
- `view`: default — good for staging and intermediate models
|
|
76
|
+
- `table`: for expensive queries queried frequently
|
|
77
|
+
- `incremental`: for large fact tables that grow over time
|
|
78
|
+
- `ephemeral`: CTEs, not materialized — use for simple transformations called once
|
|
79
|
+
|
|
80
|
+
### Testing — required on every mart model
|
|
81
|
+
```yaml
|
|
82
|
+
# models/marts/finance/schema.yml
|
|
83
|
+
version: 2
|
|
84
|
+
|
|
85
|
+
models:
|
|
86
|
+
- name: fct_revenue
|
|
87
|
+
description: "One row per completed order"
|
|
88
|
+
columns:
|
|
89
|
+
- name: order_id
|
|
90
|
+
description: "Primary key"
|
|
91
|
+
tests:
|
|
92
|
+
- unique
|
|
93
|
+
- not_null
|
|
94
|
+
- name: customer_id
|
|
95
|
+
tests:
|
|
96
|
+
- not_null
|
|
97
|
+
- relationships:
|
|
98
|
+
to: ref('dim_customers')
|
|
99
|
+
field: customer_id
|
|
100
|
+
- name: amount
|
|
101
|
+
tests:
|
|
102
|
+
- not_null
|
|
103
|
+
- dbt_utils.accepted_range:
|
|
104
|
+
min_value: 0
|
|
105
|
+
```
|
|
106
|
+
|
|
107
|
+
Minimum tests on every mart model: `unique` + `not_null` on primary key, `not_null` on critical foreign keys.
|
|
108
|
+
|
|
109
|
+
### Sources configuration
|
|
110
|
+
```yaml
|
|
111
|
+
# models/staging/sources.yml
|
|
112
|
+
version: 2
|
|
113
|
+
|
|
114
|
+
sources:
|
|
115
|
+
- name: raw_stripe
|
|
116
|
+
database: raw
|
|
117
|
+
schema: stripe
|
|
118
|
+
freshness:
|
|
119
|
+
warn_after: {count: 12, period: hour}
|
|
120
|
+
error_after: {count: 24, period: hour}
|
|
121
|
+
loaded_at_field: _ingested_at
|
|
122
|
+
tables:
|
|
123
|
+
- name: charges
|
|
124
|
+
description: "Raw Stripe charges from Fivetran"
|
|
125
|
+
```
|
|
126
|
+
|
|
127
|
+
Always set `freshness` on sources — stale source data is a silent failure.
|
|
128
|
+
|
|
129
|
+
### Macros for reusable logic
|
|
130
|
+
```sql
|
|
131
|
+
-- macros/cents_to_dollars.sql
|
|
132
|
+
{% macro cents_to_dollars(column_name) %}
|
|
133
|
+
({{ column_name }} / 100.0)::numeric(10, 2)
|
|
134
|
+
{% endmacro %}
|
|
135
|
+
|
|
136
|
+
-- Usage in model
|
|
137
|
+
select
|
|
138
|
+
{{ cents_to_dollars('amount_cents') }} as amount_dollars
|
|
139
|
+
from orders
|
|
140
|
+
```
|
|
141
|
+
|
|
142
|
+
## Example
|
|
143
|
+
|
|
144
|
+
**User:** Create staging and mart models for Stripe payments data (charges, refunds) with tests and freshness checks.
|
|
145
|
+
|
|
146
|
+
**Expected output:**
|
|
147
|
+
- `models/staging/stripe/sources.yml` — source with freshness check on `_ingested_at`
|
|
148
|
+
- `models/staging/stripe/stg_stripe_charges.sql` — rename, cast, no joins
|
|
149
|
+
- `models/staging/stripe/stg_stripe_refunds.sql` — same pattern
|
|
150
|
+
- `models/marts/finance/fct_payments.sql` — join charges + refunds, net amount, incremental materialization
|
|
151
|
+
- `models/marts/finance/schema.yml` — `unique` + `not_null` on `charge_id`, relationship test on `customer_id`
|
|
152
|
+
|
|
153
|
+
---
|
|
154
|
+
|
|
155
|
+
> **Work with us:** Claudient is backed by [Uitbreiden](https://uitbreiden.com/) — we build AI products and B2B solutions with developer communities. Building data pipelines for AI or analytics products? [uitbreiden.com](https://uitbreiden.com/)
|
|
@@ -0,0 +1,157 @@
|
|
|
1
|
+
> 🇩🇪 Dies ist die deutsche Übersetzung. [Englische Version](../dbt-data-pipelines.md).
|
|
2
|
+
|
|
3
|
+
# dbt Data Pipelines Skill
|
|
4
|
+
|
|
5
|
+
## Wann aktivieren
|
|
6
|
+
- dbt-Modelle schreiben (Staging-, Intermediate-, Mart-Layer)
|
|
7
|
+
- dbt Sources, Refs und Abhängigkeiten konfigurieren
|
|
8
|
+
- dbt-Tests schreiben (Schema-Tests, singuläre Tests, benutzerdefinierte generische Tests)
|
|
9
|
+
- dbt-Projektstruktur für ein neues Data Warehouse einrichten
|
|
10
|
+
- dbt-Makros für wiederverwendbare SQL-Logik schreiben
|
|
11
|
+
- dbt-Dokumentation und Frische-Prüfungen konfigurieren
|
|
12
|
+
- dbt-Kompilierungsfehler oder fehlgeschlagene Modell-Ausführungen debuggen
|
|
13
|
+
- dbt mit BigQuery, Snowflake, Redshift oder DuckDB einrichten
|
|
14
|
+
|
|
15
|
+
## Wann NICHT verwenden
|
|
16
|
+
- Rohe ETL-Pipelines ohne Warehouse (stattdessen Airflow, Prefect oder Dagster verwenden)
|
|
17
|
+
- Echtzeit-Streaming-Daten (dbt ist nur für Batch-Verarbeitung)
|
|
18
|
+
- Pandas/Polars-In-Memory-Transformationen (Pandas-Polars Skill verwenden)
|
|
19
|
+
- Datenaufnahme (dbt transformiert, nimmt keine Daten auf)
|
|
20
|
+
|
|
21
|
+
## Anweisungen
|
|
22
|
+
|
|
23
|
+
### Projektschicht-Architektur
|
|
24
|
+
Modelle immer in drei Layer aufteilen:
|
|
25
|
+
```
|
|
26
|
+
models/
|
|
27
|
+
├── staging/ ← 1:1 mit Quelltabellen. Nur leichte Bereinigung. Keine Joins.
|
|
28
|
+
│ ├── stg_orders.sql
|
|
29
|
+
│ └── stg_customers.sql
|
|
30
|
+
├── intermediate/ ← Geschäftslogik. Joins erlaubt. Nicht BI-Tools ausgesetzt.
|
|
31
|
+
│ └── int_orders_with_customers.sql
|
|
32
|
+
└── marts/ ← Finale Business-Entitäten. BI ausgesetzt. Aggregationen hier.
|
|
33
|
+
├── finance/
|
|
34
|
+
│ └── fct_revenue.sql
|
|
35
|
+
└── marketing/
|
|
36
|
+
└── dim_customers.sql
|
|
37
|
+
```
|
|
38
|
+
|
|
39
|
+
**Staging-Regeln:**
|
|
40
|
+
- Spalten nach Projektkonventionen umbenennen (snake_case)
|
|
41
|
+
- Typen explizit casten
|
|
42
|
+
- Keine Geschäftslogik — keine Joins, keine Aggregationen
|
|
43
|
+
- Präfix `stg_`
|
|
44
|
+
|
|
45
|
+
**Mart-Regeln:**
|
|
46
|
+
- `fct_`-Präfix für Faktentabellen (Ereignisse, Transaktionen)
|
|
47
|
+
- `dim_`-Präfix für Dimensionstabellen (Kunden, Produkte)
|
|
48
|
+
- Immer in schema.yml dokumentieren
|
|
49
|
+
|
|
50
|
+
### Modellkonfiguration
|
|
51
|
+
```sql
|
|
52
|
+
-- models/marts/finance/fct_revenue.sql
|
|
53
|
+
{{
|
|
54
|
+
config(
|
|
55
|
+
materialized='incremental',
|
|
56
|
+
unique_key='order_id',
|
|
57
|
+
on_schema_change='fail'
|
|
58
|
+
)
|
|
59
|
+
}}
|
|
60
|
+
|
|
61
|
+
with orders as (
|
|
62
|
+
select * from {{ ref('int_orders_with_customers') }}
|
|
63
|
+
{% if is_incremental() %}
|
|
64
|
+
where created_at > (select max(created_at) from {{ this }})
|
|
65
|
+
{% endif %}
|
|
66
|
+
)
|
|
67
|
+
|
|
68
|
+
select
|
|
69
|
+
order_id,
|
|
70
|
+
customer_id,
|
|
71
|
+
amount,
|
|
72
|
+
created_at
|
|
73
|
+
from orders
|
|
74
|
+
```
|
|
75
|
+
|
|
76
|
+
**Materialisierungsauswahl:**
|
|
77
|
+
- `view`: Standard — gut für Staging- und Intermediate-Modelle
|
|
78
|
+
- `table`: für teure Abfragen, die häufig abgefragt werden
|
|
79
|
+
- `incremental`: für große Faktentabellen, die im Laufe der Zeit wachsen
|
|
80
|
+
- `ephemeral`: CTEs, nicht materialisiert — für einfache Transformationen verwenden, die einmal aufgerufen werden
|
|
81
|
+
|
|
82
|
+
### Tests — erforderlich für jedes Mart-Modell
|
|
83
|
+
```yaml
|
|
84
|
+
# models/marts/finance/schema.yml
|
|
85
|
+
version: 2
|
|
86
|
+
|
|
87
|
+
models:
|
|
88
|
+
- name: fct_revenue
|
|
89
|
+
description: "Eine Zeile pro abgeschlossener Bestellung"
|
|
90
|
+
columns:
|
|
91
|
+
- name: order_id
|
|
92
|
+
description: "Primärschlüssel"
|
|
93
|
+
tests:
|
|
94
|
+
- unique
|
|
95
|
+
- not_null
|
|
96
|
+
- name: customer_id
|
|
97
|
+
tests:
|
|
98
|
+
- not_null
|
|
99
|
+
- relationships:
|
|
100
|
+
to: ref('dim_customers')
|
|
101
|
+
field: customer_id
|
|
102
|
+
- name: amount
|
|
103
|
+
tests:
|
|
104
|
+
- not_null
|
|
105
|
+
- dbt_utils.accepted_range:
|
|
106
|
+
min_value: 0
|
|
107
|
+
```
|
|
108
|
+
|
|
109
|
+
Minimale Tests bei jedem Mart-Modell: `unique` + `not_null` auf dem Primärschlüssel, `not_null` auf kritischen Fremdschlüsseln.
|
|
110
|
+
|
|
111
|
+
### Sources-Konfiguration
|
|
112
|
+
```yaml
|
|
113
|
+
# models/staging/sources.yml
|
|
114
|
+
version: 2
|
|
115
|
+
|
|
116
|
+
sources:
|
|
117
|
+
- name: raw_stripe
|
|
118
|
+
database: raw
|
|
119
|
+
schema: stripe
|
|
120
|
+
freshness:
|
|
121
|
+
warn_after: {count: 12, period: hour}
|
|
122
|
+
error_after: {count: 24, period: hour}
|
|
123
|
+
loaded_at_field: _ingested_at
|
|
124
|
+
tables:
|
|
125
|
+
- name: charges
|
|
126
|
+
description: "Rohe Stripe-Charges von Fivetran"
|
|
127
|
+
```
|
|
128
|
+
|
|
129
|
+
Immer `freshness` auf Sources setzen — veraltete Quelldaten sind ein stiller Fehler.
|
|
130
|
+
|
|
131
|
+
### Makros für wiederverwendbare Logik
|
|
132
|
+
```sql
|
|
133
|
+
-- macros/cents_to_dollars.sql
|
|
134
|
+
{% macro cents_to_dollars(column_name) %}
|
|
135
|
+
({{ column_name }} / 100.0)::numeric(10, 2)
|
|
136
|
+
{% endmacro %}
|
|
137
|
+
|
|
138
|
+
-- Verwendung im Modell
|
|
139
|
+
select
|
|
140
|
+
{{ cents_to_dollars('amount_cents') }} as amount_dollars
|
|
141
|
+
from orders
|
|
142
|
+
```
|
|
143
|
+
|
|
144
|
+
## Beispiel
|
|
145
|
+
|
|
146
|
+
**Benutzer:** Staging- und Mart-Modelle für Stripe-Zahlungsdaten (Charges, Refunds) mit Tests und Frische-Prüfungen erstellen.
|
|
147
|
+
|
|
148
|
+
**Erwartete Ausgabe:**
|
|
149
|
+
- `models/staging/stripe/sources.yml` — Source mit Frische-Prüfung auf `_ingested_at`
|
|
150
|
+
- `models/staging/stripe/stg_stripe_charges.sql` — umbenennen, casten, keine Joins
|
|
151
|
+
- `models/staging/stripe/stg_stripe_refunds.sql` — gleiches Muster
|
|
152
|
+
- `models/marts/finance/fct_payments.sql` — Charges + Refunds zusammenführen, Nettobetrag, inkrementelle Materialisierung
|
|
153
|
+
- `models/marts/finance/schema.yml` — `unique` + `not_null` auf `charge_id`, Beziehungstest auf `customer_id`
|
|
154
|
+
|
|
155
|
+
---
|
|
156
|
+
|
|
157
|
+
> **Mit uns arbeiten:** Claudient wird von [Uitbreiden](https://uitbreiden.com/) unterstützt — wir bauen KI-Produkte und B2B-Lösungen mit Entwickler-Communities. Datenpipelines für KI- oder Analytics-Produkte aufbauen? [uitbreiden.com](https://uitbreiden.com/)
|
|
@@ -0,0 +1,147 @@
|
|
|
1
|
+
> 🇩🇪 Dies ist die deutsche Übersetzung. [Englische Version](../pandas-polars.md).
|
|
2
|
+
|
|
3
|
+
# Pandas / Polars Skill
|
|
4
|
+
|
|
5
|
+
## Wann aktivieren
|
|
6
|
+
- Tabellarische Daten in Python bereinigen, transformieren oder aggregieren
|
|
7
|
+
- DataFrames zusammenführen, verknüpfen oder umstrukturieren
|
|
8
|
+
- Datenvalidierung oder Qualitätsprüfungen schreiben
|
|
9
|
+
- Zwischen Formaten konvertieren (CSV, Parquet, JSON, Excel)
|
|
10
|
+
- Ein neues Dataset profilieren oder erkunden
|
|
11
|
+
- Langsamen Pandas-Code für große Datensätze optimieren
|
|
12
|
+
- Pandas-Code nach Polars migrieren, um die Performance zu verbessern
|
|
13
|
+
|
|
14
|
+
## Wann NICHT verwenden
|
|
15
|
+
- SQL in einer Datenbank (Transformationen in die Datenbank auslagern, wenn die Daten bereits dort sind)
|
|
16
|
+
- Spark/verteiltes Computing (PySpark Skill für Datensätze > verfügbarer RAM verwenden)
|
|
17
|
+
- dbt-Modelle (SQL-basierte Transformationen in einem Warehouse)
|
|
18
|
+
- NumPy-Array-Operationen auf nicht-tabellarischen Daten
|
|
19
|
+
|
|
20
|
+
## Anweisungen
|
|
21
|
+
|
|
22
|
+
### Pandas — Performance-Regeln
|
|
23
|
+
```python
|
|
24
|
+
import pandas as pd
|
|
25
|
+
import numpy as np
|
|
26
|
+
|
|
27
|
+
# Niemals iterrows() verwenden — stattdessen vektorisieren
|
|
28
|
+
# Schlecht:
|
|
29
|
+
for idx, row in df.iterrows():
|
|
30
|
+
df.at[idx, 'tax'] = row['price'] * 0.2
|
|
31
|
+
|
|
32
|
+
# Gut:
|
|
33
|
+
df['tax'] = df['price'] * 0.2
|
|
34
|
+
|
|
35
|
+
# .loc für labelbasierte Zugriffe, .iloc für positionsbasierte
|
|
36
|
+
# Niemals ohne Zuweisung verketten — verursacht SettingWithCopyWarning
|
|
37
|
+
df.loc[df['status'] == 'active', 'flag'] = True
|
|
38
|
+
|
|
39
|
+
# Kategorischer Datentyp für Zeichenkettenspalten mit geringer Kardinalität (große Speichereinsparung)
|
|
40
|
+
df['country'] = df['country'].astype('category')
|
|
41
|
+
|
|
42
|
+
# Numerische Typen für weniger Speicherverbrauch herunterstufen
|
|
43
|
+
df['quantity'] = pd.to_numeric(df['quantity'], downcast='integer')
|
|
44
|
+
df['price'] = pd.to_numeric(df['price'], downcast='float')
|
|
45
|
+
```
|
|
46
|
+
|
|
47
|
+
### Pandas — Aggregation und Groupby
|
|
48
|
+
```python
|
|
49
|
+
# Groupby mit mehreren Aggregationen
|
|
50
|
+
summary = (
|
|
51
|
+
df.groupby(['region', 'category'])
|
|
52
|
+
.agg(
|
|
53
|
+
total_revenue=('revenue', 'sum'),
|
|
54
|
+
order_count=('order_id', 'nunique'),
|
|
55
|
+
avg_order_value=('revenue', 'mean'),
|
|
56
|
+
)
|
|
57
|
+
.reset_index()
|
|
58
|
+
.sort_values('total_revenue', ascending=False)
|
|
59
|
+
)
|
|
60
|
+
```
|
|
61
|
+
|
|
62
|
+
### Pandas — Zusammenführen
|
|
63
|
+
```python
|
|
64
|
+
# how= immer explizit angeben — niemals auf den Standard (inner) verlassen
|
|
65
|
+
result = pd.merge(
|
|
66
|
+
orders,
|
|
67
|
+
customers,
|
|
68
|
+
on='customer_id',
|
|
69
|
+
how='left', # explizit
|
|
70
|
+
validate='m:1', # validiert Kardinalität — wirft Fehler bei Verletzung
|
|
71
|
+
suffixes=('_order', '_customer')
|
|
72
|
+
)
|
|
73
|
+
```
|
|
74
|
+
|
|
75
|
+
### Polars — wann statt Pandas verwenden
|
|
76
|
+
Polars verwenden, wenn:
|
|
77
|
+
- Datensatz > 1M Zeilen (Polars ist für viele Operationen 5–100x schneller)
|
|
78
|
+
- Lazy Evaluation benötigt wird (Abfrageoptimierung vor der Ausführung)
|
|
79
|
+
- Parallelismus wichtig ist (Polars verwendet standardmäßig alle CPU-Kerne)
|
|
80
|
+
|
|
81
|
+
```python
|
|
82
|
+
import polars as pl
|
|
83
|
+
|
|
84
|
+
# Lazy API — Abfragen werden vor der Ausführung optimiert
|
|
85
|
+
result = (
|
|
86
|
+
pl.scan_parquet("orders.parquet") # Lazy Scan — noch keine Daten geladen
|
|
87
|
+
.filter(pl.col("status") == "completed")
|
|
88
|
+
.group_by(["region", "category"])
|
|
89
|
+
.agg([
|
|
90
|
+
pl.col("revenue").sum().alias("total_revenue"),
|
|
91
|
+
pl.col("order_id").n_unique().alias("order_count"),
|
|
92
|
+
pl.col("revenue").mean().alias("avg_order_value"),
|
|
93
|
+
])
|
|
94
|
+
.sort("total_revenue", descending=True)
|
|
95
|
+
.collect() # Jetzt ausführen
|
|
96
|
+
)
|
|
97
|
+
```
|
|
98
|
+
|
|
99
|
+
### Polars — Ausdrücke (keine verkettete Indizierung)
|
|
100
|
+
```python
|
|
101
|
+
# Polars: kein SettingWithCopyWarning, keine verkettete Indizierung
|
|
102
|
+
df = df.with_columns([
|
|
103
|
+
(pl.col("price") * 0.2).alias("tax"),
|
|
104
|
+
pl.col("name").str.to_uppercase().alias("name_upper"),
|
|
105
|
+
pl.when(pl.col("quantity") > 10)
|
|
106
|
+
.then(pl.lit("bulk"))
|
|
107
|
+
.otherwise(pl.lit("standard"))
|
|
108
|
+
.alias("order_type"),
|
|
109
|
+
])
|
|
110
|
+
```
|
|
111
|
+
|
|
112
|
+
### Datenvalidierungsmuster
|
|
113
|
+
```python
|
|
114
|
+
def validate_orders(df: pd.DataFrame) -> None:
|
|
115
|
+
assert df['order_id'].notna().all(), "order_id has nulls"
|
|
116
|
+
assert df['order_id'].is_unique, "order_id has duplicates"
|
|
117
|
+
assert (df['amount'] >= 0).all(), "amount has negative values"
|
|
118
|
+
assert df['status'].isin(['pending', 'completed', 'cancelled']).all(), "invalid status values"
|
|
119
|
+
assert pd.to_datetime(df['created_at'], errors='coerce').notna().all(), "created_at has invalid dates"
|
|
120
|
+
```
|
|
121
|
+
|
|
122
|
+
### Formatkonvertierung
|
|
123
|
+
```python
|
|
124
|
+
# Lesen
|
|
125
|
+
df = pd.read_parquet("data.parquet", columns=['id', 'name', 'amount']) # Spaltenauswahl beim Lesen
|
|
126
|
+
df = pd.read_csv("data.csv", dtype={'id': str}, parse_dates=['created_at'])
|
|
127
|
+
|
|
128
|
+
# Schreiben — immer Parquet gegenüber CSV für große Datensätze bevorzugen
|
|
129
|
+
df.to_parquet("output.parquet", index=False, compression='snappy')
|
|
130
|
+
```
|
|
131
|
+
|
|
132
|
+
## Beispiel
|
|
133
|
+
|
|
134
|
+
**Benutzer:** Eine rohe Bestellungs-CSV bereinigen: Datentypen korrigieren, Duplikate entfernen, Nullwerte behandeln, abgeleitete Spalten hinzufügen (revenue_after_tax, order_size_bucket) und eine validierte Parquet-Datei ausgeben.
|
|
135
|
+
|
|
136
|
+
**Erwartete Ausgabe:**
|
|
137
|
+
- Lesen mit explizitem `dtype=` und `parse_dates=`
|
|
138
|
+
- Doppelte `order_id`-Zeilen löschen (letzte behalten)
|
|
139
|
+
- Nullwerte füllen: `quantity` → 0, `discount` → 0.0, Zeilen löschen, bei denen `customer_id` null ist
|
|
140
|
+
- Ableiten: `revenue_after_tax = price * quantity * (1 - discount) * 0.8`
|
|
141
|
+
- Einteilung: `order_size_bucket` = 'small'/<100, 'medium'/100–1000, 'large'/>1000
|
|
142
|
+
- Mit Assertions vor dem Schreiben validieren
|
|
143
|
+
- In Parquet mit Snappy-Komprimierung schreiben
|
|
144
|
+
|
|
145
|
+
---
|
|
146
|
+
|
|
147
|
+
> **Mit uns arbeiten:** Claudient wird von [Uitbreiden](https://uitbreiden.com/) unterstützt — wir bauen KI-Produkte und B2B-Lösungen mit Entwickler-Communities. Datenpipelines oder KI-Datenprodukte aufbauen? [uitbreiden.com](https://uitbreiden.com/)
|