@revos/cli 0.2.0 → 0.2.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/LICENSE +9 -0
- package/README.md +286 -41
- package/dist/adapters/oclif/commands/action-runs/get.mjs +1 -1
- package/dist/adapters/oclif/commands/action-runs/list.mjs +8 -2
- package/dist/adapters/oclif/commands/actions/get-input-schema.mjs +2 -2
- package/dist/adapters/oclif/commands/actions/get-params-schema.mjs +2 -2
- package/dist/adapters/oclif/commands/actions/get.mjs +1 -1
- package/dist/adapters/oclif/commands/actions/list.mjs +8 -4
- package/dist/adapters/oclif/commands/ai-instructions/create.mjs +1 -1
- package/dist/adapters/oclif/commands/ai-instructions/delete.mjs +1 -1
- package/dist/adapters/oclif/commands/ai-instructions/get.mjs +1 -1
- package/dist/adapters/oclif/commands/ai-instructions/list.mjs +8 -2
- package/dist/adapters/oclif/commands/ai-instructions/update.mjs +1 -1
- package/dist/adapters/oclif/commands/api.d.mts +11 -0
- package/dist/adapters/oclif/commands/api.mjs +112 -0
- package/dist/adapters/oclif/commands/apply.d.mts +28 -0
- package/dist/adapters/oclif/commands/apply.mjs +77 -0
- package/dist/adapters/oclif/commands/auth/login.d.mts +5 -4
- package/dist/adapters/oclif/commands/auth/login.mjs +22 -11
- package/dist/adapters/oclif/commands/auth/logout.d.mts +1 -1
- package/dist/adapters/oclif/commands/auth/logout.mjs +7 -3
- package/dist/adapters/oclif/commands/auth/status.d.mts +2 -2
- package/dist/adapters/oclif/commands/auth/status.mjs +2 -2
- package/dist/adapters/oclif/commands/connections/create.d.mts +6 -0
- package/dist/adapters/oclif/commands/connections/create.mjs +8 -0
- package/dist/adapters/oclif/commands/connections/delete.d.mts +6 -0
- package/dist/adapters/oclif/commands/connections/delete.mjs +8 -0
- package/dist/adapters/oclif/commands/connections/get.d.mts +6 -0
- package/dist/adapters/oclif/commands/connections/get.mjs +8 -0
- package/dist/adapters/oclif/commands/connections/list.d.mts +6 -0
- package/dist/adapters/oclif/commands/connections/list.mjs +14 -0
- package/dist/adapters/oclif/commands/connections/update.d.mts +6 -0
- package/dist/adapters/oclif/commands/connections/update.mjs +8 -0
- package/dist/adapters/oclif/commands/cubes/create.d.mts +6 -0
- package/dist/adapters/oclif/commands/cubes/create.mjs +8 -0
- package/dist/adapters/oclif/commands/cubes/delete.d.mts +6 -0
- package/dist/adapters/oclif/commands/cubes/delete.mjs +8 -0
- package/dist/adapters/oclif/commands/cubes/get.d.mts +6 -0
- package/dist/adapters/oclif/commands/cubes/get.mjs +8 -0
- package/dist/adapters/oclif/commands/cubes/list.d.mts +6 -0
- package/dist/adapters/oclif/commands/cubes/list.mjs +13 -0
- package/dist/adapters/oclif/commands/cubes/update.d.mts +6 -0
- package/dist/adapters/oclif/commands/cubes/update.mjs +8 -0
- package/dist/adapters/oclif/commands/diff.d.mts +27 -0
- package/dist/adapters/oclif/commands/diff.mjs +66 -0
- package/dist/adapters/oclif/commands/gservice-account-keys/get.mjs +1 -1
- package/dist/adapters/oclif/commands/gservice-account-keys/reveal.mjs +2 -2
- package/dist/adapters/oclif/commands/gservice-accounts/create.mjs +1 -1
- package/dist/adapters/oclif/commands/gservice-accounts/delete.mjs +1 -1
- package/dist/adapters/oclif/commands/gservice-accounts/get.mjs +1 -1
- package/dist/adapters/oclif/commands/gservice-accounts/list.mjs +7 -2
- package/dist/adapters/oclif/commands/init.d.mts +2 -1
- package/dist/adapters/oclif/commands/init.mjs +28 -24
- package/dist/adapters/oclif/commands/org/create.mjs +1 -1
- package/dist/adapters/oclif/commands/org/current.d.mts +2 -2
- package/dist/adapters/oclif/commands/org/current.mjs +2 -2
- package/dist/adapters/oclif/commands/org/get.mjs +1 -1
- package/dist/adapters/oclif/commands/org/list.d.mts +3 -11
- package/dist/adapters/oclif/commands/org/list.mjs +26 -26
- package/dist/adapters/oclif/commands/org/switch.d.mts +3 -2
- package/dist/adapters/oclif/commands/org/switch.mjs +13 -5
- package/dist/adapters/oclif/commands/pull.d.mts +28 -0
- package/dist/adapters/oclif/commands/pull.mjs +88 -0
- package/dist/adapters/oclif/commands/score-groups/create.mjs +3 -2
- package/dist/adapters/oclif/commands/score-groups/delete.mjs +1 -1
- package/dist/adapters/oclif/commands/score-groups/get.mjs +1 -1
- package/dist/adapters/oclif/commands/score-groups/list.mjs +3 -2
- package/dist/adapters/oclif/commands/score-groups/update.mjs +1 -1
- package/dist/adapters/oclif/commands/scores/create.mjs +3 -2
- package/dist/adapters/oclif/commands/scores/delete.mjs +1 -1
- package/dist/adapters/oclif/commands/scores/list.mjs +3 -2
- package/dist/adapters/oclif/commands/scores/update.mjs +1 -1
- package/dist/adapters/oclif/commands/segments/create.mjs +1 -1
- package/dist/adapters/oclif/commands/segments/delete.mjs +1 -1
- package/dist/adapters/oclif/commands/segments/evaluate.mjs +2 -2
- package/dist/adapters/oclif/commands/segments/get-evaluation-history.mjs +2 -2
- package/dist/adapters/oclif/commands/segments/get-version.mjs +2 -2
- package/dist/adapters/oclif/commands/segments/get.mjs +1 -1
- package/dist/adapters/oclif/commands/segments/list-versions.mjs +16 -5
- package/dist/adapters/oclif/commands/segments/list.mjs +9 -2
- package/dist/adapters/oclif/commands/segments/restore-version.mjs +2 -2
- package/dist/adapters/oclif/commands/segments/update.mjs +1 -1
- package/dist/adapters/oclif/commands/sources/create.d.mts +11 -0
- package/dist/adapters/oclif/commands/sources/create.mjs +16 -0
- package/dist/adapters/oclif/commands/sources/delete.d.mts +6 -0
- package/dist/adapters/oclif/commands/sources/delete.mjs +8 -0
- package/dist/adapters/oclif/commands/sources/get.d.mts +6 -0
- package/dist/adapters/oclif/commands/sources/get.mjs +8 -0
- package/dist/adapters/oclif/commands/sources/list-streams.d.mts +6 -0
- package/dist/adapters/oclif/commands/sources/list-streams.mjs +31 -0
- package/dist/adapters/oclif/commands/sources/list.d.mts +6 -0
- package/dist/adapters/oclif/commands/sources/list.mjs +13 -0
- package/dist/adapters/oclif/commands/sources/update.d.mts +15 -0
- package/dist/adapters/oclif/commands/sources/update.mjs +21 -0
- package/dist/adapters/oclif/commands/status.d.mts +26 -0
- package/dist/adapters/oclif/commands/status.mjs +77 -0
- package/dist/adapters/oclif/commands/table-views/create.mjs +3 -2
- package/dist/adapters/oclif/commands/table-views/delete.mjs +1 -1
- package/dist/adapters/oclif/commands/table-views/list.mjs +3 -2
- package/dist/adapters/oclif/commands/table-views/update.mjs +1 -1
- package/dist/adapters/oclif/commands/tables/create.mjs +1 -1
- package/dist/adapters/oclif/commands/tables/delete.mjs +1 -1
- package/dist/adapters/oclif/commands/tables/get.mjs +1 -1
- package/dist/adapters/oclif/commands/tables/list.mjs +3 -2
- package/dist/adapters/oclif/commands/tables/update.mjs +1 -1
- package/dist/{base.command-d7VW6WTp.d.mts → base.command-D7X3ZNtY.d.mts} +0 -1
- package/dist/{base.command-DlVQ9Cqa.mjs → base.command-cV5d65r8.mjs} +15 -12
- package/dist/chunk-CfYAbeIz.mjs +13 -0
- package/dist/core-CMrP5BQS.mjs +2378 -0
- package/dist/{factory-D9sR_S_g.mjs → factory-C6XLqhT9.mjs} +44 -10
- package/dist/iac-render-BSZZEP0n.mjs +17 -0
- package/dist/index-BqKwXXAo.d.mts +598 -0
- package/dist/index.d.mts +3 -4
- package/dist/index.mjs +2 -2
- package/dist/{presets-Cvazkjmu.mjs → presets-CJbFbHlw.mjs} +35 -8
- package/dist/templates/.claude/settings.json +39 -0
- package/dist/templates/.devcontainer/devcontainer.json +2 -2
- package/dist/templates/.devcontainer/setup.sh +3 -0
- package/dist/templates/AGENTS.md +36 -13
- package/dist/templates/dbt/dbt_project.yml +2 -2
- package/dist/templates/skills/create-connections/SKILL.md +210 -0
- package/dist/templates/skills/create-connections/references/mappers.md +152 -0
- package/dist/templates/skills/{create-semantic-model → create-cubes}/SKILL.md +28 -26
- package/dist/templates/skills/create-cubes/references/bq-pk-fk-conventions.md +183 -0
- package/dist/templates/skills/{create-semantic-model → create-cubes}/references/cube-examples.md +85 -7
- package/dist/templates/skills/create-cubes/references/hubspot-entities.md +289 -0
- package/dist/templates/skills/create-cubes/references/jira-entities.md +201 -0
- package/dist/templates/skills/create-cubes/references/netsuite-entities.md +121 -0
- package/dist/templates/skills/create-cubes/references/stripe-entities.md +114 -0
- package/dist/templates/skills/create-dbt-transformations/SKILL.md +62 -33
- package/dist/templates/skills/create-dbt-transformations/references/edge-cases.md +21 -3
- package/dist/templates/skills/create-dbt-transformations/references/schema-conventions.md +21 -7
- package/dist/templates/skills/create-dbt-transformations/references/sql-templates.md +34 -20
- package/dist/templates/skills/explore-lakehouse/SKILL.md +8 -4
- package/dist/templates/skills/load-sample-data/SKILL.md +119 -0
- package/dist/templates/skills/visualize-semantic-model/SKILL.md +159 -0
- package/dist/templates/skills/visualize-semantic-model/scripts/render_graph.py +186 -0
- package/dist/{types-Y_ht_ja5.d.mts → types-CGjxcj4L.d.mts} +3 -0
- package/package.json +48 -6
- package/dist/adapters/oclif/commands/overlays/diff.d.mts +0 -19
- package/dist/adapters/oclif/commands/overlays/diff.mjs +0 -80
- package/dist/adapters/oclif/commands/overlays/pull.d.mts +0 -15
- package/dist/adapters/oclif/commands/overlays/pull.mjs +0 -44
- package/dist/adapters/oclif/commands/overlays/push.d.mts +0 -18
- package/dist/adapters/oclif/commands/overlays/push.mjs +0 -59
- package/dist/adapters/oclif/commands/overlays/status.d.mts +0 -18
- package/dist/adapters/oclif/commands/overlays/status.mjs +0 -53
- package/dist/core-gKJ_V-K5.mjs +0 -973
- package/dist/index-KAzwt5vr.d.mts +0 -190
- package/dist/types-C_p_6rkj.d.mts +0 -69
- /package/dist/templates/skills/{create-semantic-model → create-cubes}/references/key-patterns.md +0 -0
- /package/dist/templates/skills/{create-semantic-model → create-cubes}/references/validation-queries.md +0 -0
package/dist/index.d.mts
CHANGED
|
@@ -1,4 +1,3 @@
|
|
|
1
|
-
import {
|
|
2
|
-
import {
|
|
3
|
-
|
|
4
|
-
export { ApiClient, ApiError, AuthResult, AuthStatusInfo, ClerkEnv, ClerkOAuthConfig, ClerkUserInfo, Config, CubeDefinition, CubeOverlay, DiffChange, DiffContext, DiffEntry, DiffOptions, DiffResult, DiffService, InitOptions, InitResult, InitService, LoadedOverlay, OAuthCallbackResult, OAuthServerResult, OrgListResult, OrgSwitchResult, OrganizationInfo, OverlayFile, OverlayFileData, OverlayStatusInfo, PKCEChallenge, PullContext, PullOptions, PullResult, PullService, PushContext, PushOptions, PushResult, PushService, StatusContext, StatusOptions, StatusResult, StatusService, StoredCredentials, SyncStatus, TokenResponse, buildAuthorizationUrl, createApiClient, deleteCredentials, exchangeCodeForTokens, findRemoteOnlyOverlays, formatError, generatePKCEChallenge, getConfig, getCredentialsPath, getLocalOverlayNames, getUserInfo, isContentEqual, isTokenExpired, loadCredentials, loadOverlayFile, loadOverlays, loadOverlaysByNames, loadOverlaysFromDir, refreshAccessToken, sanitizeFileName, saveCredentials, saveOverlayToFile, setClerkConfig, setClerkEnv, startOAuthServer, tokenResponseToCredentials, unwrap };
|
|
1
|
+
import { A as deleteCredentials, C as getUserInfo, D as tokenResponseToCredentials, E as setAuthEnv, F as DEFAULT_API_URL, I as getConfig, L as ApiError, M as isTokenExpired, N as loadCredentials, O as OAuthServerResult, P as saveCredentials, R as Config, S as getActiveAuthConfig, T as setAuthConfig, _ as AuthEnv, b as exchangeCodeForTokens, d as unwrap, f as resolveAppUrl, g as AuthConfig, h as AUTH_ENVS, i as index_d_exports, j as getCredentialsPath, k as startOAuthServer, l as ApiClient, m as sanitizeFileName, n as InitResult, p as formatError, r as InitService, t as InitOptions, u as createApiClient, v as PKCEChallenge, w as refreshAccessToken, x as generatePKCEChallenge, y as buildAuthorizationUrl } from "./index-BqKwXXAo.mjs";
|
|
2
|
+
import { a as OrgListResult, c as StoredCredentials, i as OAuthCallbackResult, l as TokenResponse, n as AuthStatusInfo, o as OrgSwitchResult, r as ClerkUserInfo, s as OrganizationInfo, t as AuthResult } from "./types-CGjxcj4L.mjs";
|
|
3
|
+
export { AUTH_ENVS, ApiClient, ApiError, AuthConfig, AuthEnv, AuthResult, AuthStatusInfo, ClerkUserInfo, Config, DEFAULT_API_URL, InitOptions, InitResult, InitService, OAuthCallbackResult, OAuthServerResult, OrgListResult, OrgSwitchResult, OrganizationInfo, PKCEChallenge, StoredCredentials, TokenResponse, buildAuthorizationUrl, createApiClient, deleteCredentials, exchangeCodeForTokens, formatError, generatePKCEChallenge, getActiveAuthConfig, getConfig, getCredentialsPath, getUserInfo, index_d_exports as iac, isTokenExpired, loadCredentials, refreshAccessToken, resolveAppUrl, sanitizeFileName, saveCredentials, setAuthConfig, setAuthEnv, startOAuthServer, tokenResponseToCredentials, unwrap };
|
package/dist/index.mjs
CHANGED
|
@@ -1,2 +1,2 @@
|
|
|
1
|
-
import { A as
|
|
2
|
-
export { ApiError,
|
|
1
|
+
import { A as deleteCredentials, C as getActiveAuthConfig, D as setAuthEnv, E as setAuthConfig, F as ApiError, M as isTokenExpired, N as loadCredentials, O as tokenResponseToCredentials, P as saveCredentials, S as generatePKCEChallenge, T as refreshAccessToken, _ as DEFAULT_API_URL, b as buildAuthorizationUrl, f as createApiClient, g as sanitizeFileName, h as formatError, j as getCredentialsPath, k as startOAuthServer, m as resolveAppUrl, p as unwrap, r as iac_exports, t as InitService, v as getConfig, w as getUserInfo, x as exchangeCodeForTokens, y as AUTH_ENVS } from "./core-CMrP5BQS.mjs";
|
|
2
|
+
export { AUTH_ENVS, ApiError, DEFAULT_API_URL, InitService, buildAuthorizationUrl, createApiClient, deleteCredentials, exchangeCodeForTokens, formatError, generatePKCEChallenge, getActiveAuthConfig, getConfig, getCredentialsPath, getUserInfo, iac_exports as iac, isTokenExpired, loadCredentials, refreshAccessToken, resolveAppUrl, sanitizeFileName, saveCredentials, setAuthConfig, setAuthEnv, startOAuthServer, tokenResponseToCredentials, unwrap };
|
|
@@ -1,21 +1,36 @@
|
|
|
1
|
-
import {
|
|
2
|
-
import { n as defineApiCommand, t as bodyFlag } from "./factory-
|
|
1
|
+
import { p as unwrap } from "./core-CMrP5BQS.mjs";
|
|
2
|
+
import { n as createListRender, r as defineApiCommand, t as bodyFlag } from "./factory-C6XLqhT9.mjs";
|
|
3
3
|
import { Args, Flags } from "@oclif/core";
|
|
4
4
|
//#region src/adapters/oclif/presets.ts
|
|
5
|
+
function camelToKebab(s) {
|
|
6
|
+
return s.replace(/[A-Z]/g, (c) => `-${c.toLowerCase()}`);
|
|
7
|
+
}
|
|
5
8
|
const listFlags = {
|
|
6
9
|
"page-size": Flags.integer({ description: "Maximum number of items to return" }),
|
|
7
10
|
"page-token": Flags.string({ description: "Token for the next page (from previous response)" }),
|
|
8
11
|
"order-by": Flags.string({ description: "Field to order results by (e.g. 'createdAt desc')" }),
|
|
9
12
|
filter: Flags.string({ description: "Filter expression" }),
|
|
10
|
-
fields: Flags.string({ description: "Comma-separated list of fields to include" })
|
|
13
|
+
fields: Flags.string({ description: "Comma-separated list of fields to include" }),
|
|
14
|
+
columns: Flags.string({
|
|
15
|
+
description: "Columns to display in table output (comma-separated). Overrides the resource default. Ignored with --json.",
|
|
16
|
+
helpValue: "a,b,c"
|
|
17
|
+
})
|
|
11
18
|
};
|
|
12
19
|
function getResource(api, key) {
|
|
13
20
|
return api[key];
|
|
14
21
|
}
|
|
15
22
|
function listCommand(spec) {
|
|
23
|
+
const pathParams = spec.pathParams ?? [];
|
|
24
|
+
const pathFlags = Object.fromEntries(pathParams.map((name) => [camelToKebab(name), Flags.string({
|
|
25
|
+
description: `${name} path parameter`,
|
|
26
|
+
required: true
|
|
27
|
+
})]));
|
|
16
28
|
return defineApiCommand({
|
|
17
29
|
description: spec.description,
|
|
18
|
-
flags:
|
|
30
|
+
flags: {
|
|
31
|
+
...listFlags,
|
|
32
|
+
...pathFlags
|
|
33
|
+
},
|
|
19
34
|
call: async ({ api, flags }) => {
|
|
20
35
|
const resource = getResource(api, spec.resource);
|
|
21
36
|
if (!resource.list) throw new Error(`Resource '${String(spec.resource)}' has no list method`);
|
|
@@ -25,8 +40,10 @@ function listCommand(spec) {
|
|
|
25
40
|
if (flags["order-by"] !== void 0) params.orderBy = flags["order-by"];
|
|
26
41
|
if (flags.filter !== void 0) params.filter = flags.filter;
|
|
27
42
|
if (flags.fields !== void 0) params.fields = flags.fields;
|
|
43
|
+
for (const name of pathParams) params[name] = flags[camelToKebab(name)];
|
|
28
44
|
return unwrap(await resource.list(Object.keys(params).length > 0 ? params : void 0));
|
|
29
|
-
}
|
|
45
|
+
},
|
|
46
|
+
render: createListRender(spec.defaultColumns)
|
|
30
47
|
});
|
|
31
48
|
}
|
|
32
49
|
function getCommand(spec) {
|
|
@@ -44,14 +61,24 @@ function getCommand(spec) {
|
|
|
44
61
|
});
|
|
45
62
|
}
|
|
46
63
|
function createCommand(spec) {
|
|
64
|
+
const pathParams = spec.pathParams ?? [];
|
|
65
|
+
const pathFlags = Object.fromEntries(pathParams.map((name) => [camelToKebab(name), Flags.string({
|
|
66
|
+
description: `${name} path parameter`,
|
|
67
|
+
required: true
|
|
68
|
+
})]));
|
|
47
69
|
return defineApiCommand({
|
|
48
70
|
description: spec.description,
|
|
49
|
-
flags: {
|
|
50
|
-
|
|
71
|
+
flags: {
|
|
72
|
+
body: bodyFlag,
|
|
73
|
+
...pathFlags
|
|
74
|
+
},
|
|
75
|
+
call: async ({ api, body, flags }) => {
|
|
51
76
|
const resource = getResource(api, spec.resource);
|
|
52
77
|
if (!resource.create) throw new Error(`Resource '${String(spec.resource)}' has no create method`);
|
|
53
78
|
if (body === void 0) throw new Error("--body is required (inline JSON, '@path' for file, or '-' for stdin)");
|
|
54
|
-
|
|
79
|
+
const params = { body };
|
|
80
|
+
for (const name of pathParams) params[name] = flags[camelToKebab(name)];
|
|
81
|
+
return unwrap(await resource.create(params));
|
|
55
82
|
}
|
|
56
83
|
});
|
|
57
84
|
}
|
|
@@ -0,0 +1,39 @@
|
|
|
1
|
+
{
|
|
2
|
+
"permissions": {
|
|
3
|
+
"deny": [
|
|
4
|
+
"Read(~/.revos/**)",
|
|
5
|
+
"Read(~/.config/gcloud/**)",
|
|
6
|
+
"Read(/tmp/.revos-gsa-creds.json)",
|
|
7
|
+
"Read(/tmp/.revos-credentials.json)",
|
|
8
|
+
"Read(~/.claude/**)",
|
|
9
|
+
"Read(//proc/*/environ)",
|
|
10
|
+
"Edit(~/.revos/**)",
|
|
11
|
+
"Write(~/.revos/**)",
|
|
12
|
+
"Edit(~/.config/gcloud/**)",
|
|
13
|
+
"Write(~/.config/gcloud/**)",
|
|
14
|
+
"Edit(.claude/settings*.json)",
|
|
15
|
+
"Write(.claude/settings*.json)",
|
|
16
|
+
"Edit(.claude/hooks/**)",
|
|
17
|
+
"Write(.claude/hooks/**)",
|
|
18
|
+
"Edit(.claude/commands/**)",
|
|
19
|
+
"Write(.claude/commands/**)",
|
|
20
|
+
"Edit(.mcp.json)",
|
|
21
|
+
"Write(.mcp.json)",
|
|
22
|
+
"Bash(curl:*)",
|
|
23
|
+
"Bash(wget:*)",
|
|
24
|
+
"Bash(nc:*)",
|
|
25
|
+
"Bash(ncat:*)",
|
|
26
|
+
"Bash(socat:*)",
|
|
27
|
+
"Bash(env)",
|
|
28
|
+
"Bash(env:*)",
|
|
29
|
+
"Bash(printenv)",
|
|
30
|
+
"Bash(printenv:*)",
|
|
31
|
+
"Bash(gcloud auth print-access-token:*)",
|
|
32
|
+
"Bash(gcloud auth print-identity-token:*)",
|
|
33
|
+
"Bash(gcloud auth application-default print-access-token:*)"
|
|
34
|
+
]
|
|
35
|
+
},
|
|
36
|
+
"sandbox": {
|
|
37
|
+
"enabled": true
|
|
38
|
+
}
|
|
39
|
+
}
|
|
@@ -22,12 +22,12 @@
|
|
|
22
22
|
"postCreateCommand": "bash .devcontainer/setup.sh",
|
|
23
23
|
"mounts": [
|
|
24
24
|
{
|
|
25
|
-
"source": "${localEnv:HOME}/.revos/<%=projectSlug%>-gsa-creds.json",
|
|
25
|
+
"source": "${localEnv:HOME}${localEnv:USERPROFILE}/.revos/<%=projectSlug%>-gsa-creds.json",
|
|
26
26
|
"target": "/tmp/.revos-gsa-creds.json",
|
|
27
27
|
"type": "bind"
|
|
28
28
|
},
|
|
29
29
|
{
|
|
30
|
-
"source": "${localEnv:HOME}/.revos/credentials.json",
|
|
30
|
+
"source": "${localEnv:HOME}${localEnv:USERPROFILE}/.revos/credentials.json",
|
|
31
31
|
"target": "/tmp/.revos-credentials.json",
|
|
32
32
|
"type": "bind"
|
|
33
33
|
},
|
|
@@ -24,6 +24,9 @@ if [ -n "${GOOGLE_CLOUD_PROJECT:-}" ]; then
|
|
|
24
24
|
gcloud config set project "$GOOGLE_CLOUD_PROJECT"
|
|
25
25
|
fi
|
|
26
26
|
|
|
27
|
+
# Fix Claude Code credentials permissions (volume is created as root)
|
|
28
|
+
sudo chown -R vscode:vscode /home/vscode/.claude 2>/dev/null || true
|
|
29
|
+
|
|
27
30
|
# Install RevOS CLI
|
|
28
31
|
npm install -g @revos/cli
|
|
29
32
|
|
package/dist/templates/AGENTS.md
CHANGED
|
@@ -4,19 +4,42 @@ This is a RevOS data engineering project for **<%=orgName%>** organization.
|
|
|
4
4
|
|
|
5
5
|
## Project Structure
|
|
6
6
|
|
|
7
|
-
- `dbt/models/bronze/` — raw
|
|
8
|
-
- `dbt/models/silver/` — cleaned & conformed
|
|
9
|
-
- `dbt/models/gold/` — business-ready marts
|
|
10
|
-
- `
|
|
7
|
+
- `dbt/models/bronze/` — `schema.yml` only (declares raw tables as dbt sources; no SQL files)
|
|
8
|
+
- `dbt/models/silver/` — cleaned & conformed; reads raw via `{{ source('bronze', '<table>') }}`
|
|
9
|
+
- `dbt/models/gold/` — business-ready marts; reads silver via `{{ ref() }}`
|
|
10
|
+
- `cubes/` — first-class Cube.dev semantic model definitions
|
|
11
11
|
|
|
12
12
|
## Key Commands
|
|
13
13
|
|
|
14
|
-
| Command
|
|
15
|
-
|
|
|
16
|
-
| `revos auth login`
|
|
17
|
-
| `revos
|
|
18
|
-
| `revos
|
|
19
|
-
| `revos
|
|
20
|
-
| `
|
|
21
|
-
| `
|
|
22
|
-
| `
|
|
14
|
+
| Command | Description |
|
|
15
|
+
| --------------------------- | ------------------------------------------ |
|
|
16
|
+
| `revos auth login` | Authenticate with RevOS |
|
|
17
|
+
| `revos apply` | Reconcile local Connections/Cubes with API |
|
|
18
|
+
| `revos pull` | Pull current Connections/Cubes from API |
|
|
19
|
+
| `revos diff` | Compare local YAML against the API |
|
|
20
|
+
| `revos status` | Show sync status of local resources |
|
|
21
|
+
| `revos sources list` | List data sources |
|
|
22
|
+
| `revos sources get <id>` | Get a data source by ID |
|
|
23
|
+
| `revos sources create` | Open RevOS UI to add a data source |
|
|
24
|
+
| `revos sources update <id>` | Open RevOS UI to edit a data source |
|
|
25
|
+
| `revos sources delete <id>` | Delete a data source |
|
|
26
|
+
| `dbt run` | Run dbt models |
|
|
27
|
+
| `dbt test` | Test dbt models |
|
|
28
|
+
| `bq ls` | List BigQuery datasets/tables |
|
|
29
|
+
|
|
30
|
+
## Data Sources
|
|
31
|
+
|
|
32
|
+
When the user wants to add or manage data sources:
|
|
33
|
+
|
|
34
|
+
- To add a new data source: suggest `revos sources create` (opens RevOS UI)
|
|
35
|
+
- To view existing sources: suggest `revos sources list`
|
|
36
|
+
- If a BigQuery dataset is empty or has no tables, mention that a data source can be configured with `revos sources create`, or sample data can be loaded if available
|
|
37
|
+
|
|
38
|
+
## Security & sandboxing
|
|
39
|
+
|
|
40
|
+
`.claude/settings.json` in this project configures two layers of protection:
|
|
41
|
+
|
|
42
|
+
- **Permission deny rules** block reading credential paths (`~/.revos/`, `~/.config/gcloud/`, `~/.claude/`) and editing `.claude/settings.json`, hooks, and MCP config. These tool-layer rules always apply to Claude's built-in Read/Edit/Write tools.
|
|
43
|
+
- **OS-level sandbox** (`sandbox.enabled: true`) extends the same restrictions to Bash subprocesses via Seatbelt (macOS) or bubblewrap (Linux).
|
|
44
|
+
|
|
45
|
+
**Inside this Dev Container the sandbox cannot start** — Docker's default seccomp profile blocks the namespace-creation syscalls bubblewrap needs. Claude Code prints a one-time warning at launch and runs without OS-level isolation. In that fallback state, deny rules still cover Claude's Read/Edit/Write tools, but Bash commands like `cat ~/.revos/credentials.json` are not blocked. The sandbox does activate when Claude Code runs on a macOS host directly, outside the container.
|
|
@@ -0,0 +1,210 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: create-connections
|
|
3
|
+
description: >
|
|
4
|
+
Create a RevOS Connection YAML that syncs data from a Source into the
|
|
5
|
+
org's data warehouse. Use whenever the user asks to: add a connection,
|
|
6
|
+
set up a sync, ingest a table, pipe data from a source, configure streams,
|
|
7
|
+
pick which tables to ingest, or wire up a Source. The skill picks a sensible stream subset
|
|
8
|
+
(especially important for databases — most projects don't want every table)
|
|
9
|
+
and confirms the choice before writing YAML.
|
|
10
|
+
---
|
|
11
|
+
|
|
12
|
+
# Create Connection
|
|
13
|
+
|
|
14
|
+
## Purpose
|
|
15
|
+
|
|
16
|
+
Author a `Connection` YAML under `connections/<name>.yaml` that ingests selected streams from a Source into the org's data warehouse. Each connection is a complete, standalone document — `revos apply` reads it and reconciles with the API.
|
|
17
|
+
|
|
18
|
+
The hard part is **stream selection** and **sync-mode choice**, not the YAML shape. Most sources expose far more streams than a project actually needs; databases especially. Pick deliberately, confirm with the user, then write.
|
|
19
|
+
|
|
20
|
+
---
|
|
21
|
+
|
|
22
|
+
## Prerequisites
|
|
23
|
+
|
|
24
|
+
- The Source already exists on the server (you'll get its `id` via `revos sources list`). If the user wants to ingest from a source that isn't created yet, stop and tell them to run `revos sources create` first — source configuration (connector picker, credentials, OAuth) lives in the RevOS UI.
|
|
25
|
+
- `revos auth status` shows authenticated. If not, ask the user to run `revos auth login`.
|
|
26
|
+
|
|
27
|
+
---
|
|
28
|
+
|
|
29
|
+
## User Checkpoints
|
|
30
|
+
|
|
31
|
+
The skill makes two decisions that benefit from explicit confirmation. Don't skip them — wrong streams or wrong sync mode wastes warehouse storage and reload time.
|
|
32
|
+
|
|
33
|
+
### Checkpoint 1: Stream selection
|
|
34
|
+
|
|
35
|
+
After discovering streams and asking the user about their use case, propose a subset with one-line rationale per stream. Wait for confirmation or edits before moving on.
|
|
36
|
+
|
|
37
|
+
### Checkpoint 2: Sync-mode and key choices
|
|
38
|
+
|
|
39
|
+
After deriving sync mode + cursor + primary key for each selected stream, present the table and confirm before writing the file. Most users skim and approve; some will want to flip a stream to `full_refresh_overwrite` or change the cursor.
|
|
40
|
+
|
|
41
|
+
---
|
|
42
|
+
|
|
43
|
+
# Workflow
|
|
44
|
+
|
|
45
|
+
## Phase 1: Identify the Source
|
|
46
|
+
|
|
47
|
+
Run `revos sources list --json` and match on the server-side `name` (or run `revos sources list` for a scannable table). Record the source's `id` — that goes into the Connection YAML as `spec.source.id`.
|
|
48
|
+
|
|
49
|
+
If no source was named, ask the user which one. Don't proceed without an id.
|
|
50
|
+
|
|
51
|
+
## Phase 2: Discover streams
|
|
52
|
+
|
|
53
|
+
```bash
|
|
54
|
+
revos sources list-streams <id> --json
|
|
55
|
+
```
|
|
56
|
+
|
|
57
|
+
This returns an array of objects shaped like:
|
|
58
|
+
|
|
59
|
+
```json
|
|
60
|
+
{
|
|
61
|
+
"streamName": "customers",
|
|
62
|
+
"streamnamespace": "public",
|
|
63
|
+
"syncModes": ["full_refresh_overwrite", "incremental_deduped_history", ...],
|
|
64
|
+
"defaultCursorField": ["updated_at"],
|
|
65
|
+
"sourceDefinedCursorField": false,
|
|
66
|
+
"sourceDefinedPrimaryKey": [["id"]],
|
|
67
|
+
"propertyFields": [["id"], ["email"], ["updated_at"], ...]
|
|
68
|
+
}
|
|
69
|
+
```
|
|
70
|
+
|
|
71
|
+
Save the full response — you'll reference it in later phases. The `syncModes` array narrows what's valid for this stream against the project's destination; never propose a mode that isn't in this list.
|
|
72
|
+
|
|
73
|
+
Two things worth knowing about the shape:
|
|
74
|
+
|
|
75
|
+
- **Database sources usually leave `defaultCursorField` empty** even when they advertise incremental modes. They expose every column via `propertyFields` and let you pick. Plan to fall back to `propertyFields` for DB sources; SaaS sources typically pre-fill `defaultCursorField` with the right field.
|
|
76
|
+
- **`streamnamespace` is set for databases** (e.g. `public`, `dbo`) and absent for most SaaS sources. The YAML needs the `namespace:` line whenever the discovery response includes one — drop the line for streams that don't have it.
|
|
77
|
+
|
|
78
|
+
## Phase 3: Ask about the use case
|
|
79
|
+
|
|
80
|
+
Before proposing streams, ask **one short question** to anchor the selection. Examples:
|
|
81
|
+
|
|
82
|
+
- "What do you want to analyze with this connection? E.g. revenue per customer, support ticket trends, marketing funnel."
|
|
83
|
+
- "Which part of the source matters here — sales pipeline, finance records, product usage?"
|
|
84
|
+
|
|
85
|
+
Keep it open-ended; one sentence from the user is enough. The goal is to filter out obviously-irrelevant streams (audit logs, internal queues, system tables) and prioritize business entities. Skip this question only if the user already stated the goal in their initial request.
|
|
86
|
+
|
|
87
|
+
## Phase 4: Propose a stream subset (Checkpoint 1)
|
|
88
|
+
|
|
89
|
+
Apply these rules in order:
|
|
90
|
+
|
|
91
|
+
1. **Drop technical/system streams** unless the user's goal explicitly needs them. Names matching any of: starts with `pg_`, `information_schema`, `_airbyte`, `temp_`, `tmp_`, `audit_`, `system_`, `migration`, `schema_migrations`; ends with `_history`, `_log`, `_audit`, `_archive`. Be conservative — when in doubt, include it and flag it.
|
|
92
|
+
|
|
93
|
+
2. **For databases** (postgres / mysql / mssql / mongodb-style sources, recognizable by many streams and namespaces like `public`, `dbo`, `sales`): expect to drop 30–80% of streams. Project use cases rarely need every operational table. Prefer streams whose names match the user's goal (`orders`, `customers`, `products` for revenue analysis; `tickets`, `contacts`, `messages` for support).
|
|
94
|
+
|
|
95
|
+
3. **For SaaS sources** (small flat stream list, no namespace, names like `companies`, `deals`, `tickets`, `engagements`): include all core business entities by default. Drop only obvious noise (`*_metadata`, `*_history`, `*_changelog`).
|
|
96
|
+
|
|
97
|
+
4. **Be honest about uncertainty.** If you can't tell what a stream is from its name, say so — don't fabricate a rationale.
|
|
98
|
+
|
|
99
|
+
Present the proposal as a table the user can scan:
|
|
100
|
+
|
|
101
|
+
```
|
|
102
|
+
Proposed streams (12 of 47):
|
|
103
|
+
|
|
104
|
+
customers core entity — revenue analysis needs this
|
|
105
|
+
orders transactions table; cursor: updated_at
|
|
106
|
+
order_items order line items; needed to break revenue by SKU
|
|
107
|
+
products dimension table for orders/order_items
|
|
108
|
+
...
|
|
109
|
+
|
|
110
|
+
Dropped (35): pg_stat_*, _airbyte_internal_*, audit_*, schema_migrations,
|
|
111
|
+
user_sessions (not in scope for revenue analysis), ...
|
|
112
|
+
```
|
|
113
|
+
|
|
114
|
+
Ask: "Look right? Anything to add or remove?" Wait for the user. They might say "add `users`" or "drop `products`, we don't need it" — adjust.
|
|
115
|
+
|
|
116
|
+
## Phase 5: Determine sync mode, cursor, and primary key per stream
|
|
117
|
+
|
|
118
|
+
For each selected stream, the choice is driven by what the source supports (`syncModes`) and what fields it advertises. The four sync modes you'll use:
|
|
119
|
+
|
|
120
|
+
| Sync mode | Requires | When to use |
|
|
121
|
+
| -------------------------------- | -------------------------- | --------------------------------------------------------------------------------------------------------------------------------- |
|
|
122
|
+
| `incremental_deduped_history` | cursor **and** primary key | Best default. Source advertises a cursor (e.g. `updated_at`) and a PK. Only new/changed rows pull; duplicates collapse on PK. |
|
|
123
|
+
| `incremental_append` | cursor only | Source has a cursor but no PK (event streams, append-only logs). Rows accumulate; no deduplication. |
|
|
124
|
+
| `full_refresh_overwrite_deduped` | primary key only | Source has a PK but no usable cursor. Each sync replaces the destination table, deduplicated by PK. Fine for small/medium tables. |
|
|
125
|
+
| `full_refresh_overwrite` | nothing | Small dimension tables (<10k rows), or when nothing else works. Each sync overwrites. |
|
|
126
|
+
|
|
127
|
+
Decision algorithm per stream:
|
|
128
|
+
|
|
129
|
+
1. Pick the **strongest mode the source supports**, in this priority order: `incremental_deduped_history` → `incremental_append` → `full_refresh_overwrite_deduped` → `full_refresh_overwrite`. Skip any mode not in the stream's `syncModes` array.
|
|
130
|
+
2. For modes requiring a cursor: use `defaultCursorField` if non-empty. Otherwise look at `propertyFields` for the first available timestamp-looking field (`updated_at`, `modified_at`, `last_modified`, `_updated`, `timestamp`). If nothing fits, drop down to a full-refresh mode.
|
|
131
|
+
3. For modes requiring a primary key: use `sourceDefinedPrimaryKey` if non-empty. Otherwise look in `propertyFields` for an `id`-like field (`id`, `<entity>_id`, `uuid`, `key`). If nothing fits, drop down to `incremental_append` or `full_refresh_overwrite`.
|
|
132
|
+
|
|
133
|
+
Use compact notation when presenting (Checkpoint 2):
|
|
134
|
+
|
|
135
|
+
```
|
|
136
|
+
Stream Sync mode Cursor Primary key
|
|
137
|
+
customers incremental_deduped_history updated_at id
|
|
138
|
+
orders incremental_deduped_history updated_at id
|
|
139
|
+
order_items incremental_append created_at —
|
|
140
|
+
products full_refresh_overwrite — —
|
|
141
|
+
```
|
|
142
|
+
|
|
143
|
+
Ask: "Sync modes look right? Any changes?" Common edits: dropping a large table to `full_refresh_overwrite` only if it's small/static; switching cursor field.
|
|
144
|
+
|
|
145
|
+
## Phase 6: Write the YAML
|
|
146
|
+
|
|
147
|
+
Two names to pick, with different rules:
|
|
148
|
+
|
|
149
|
+
- **`metadata.name`** — the local IaC slug. Filename-friendly: lowercase, alphanumerics + hyphens. Anchors the YAML file (`connections/<metadata.name>.yaml`) and how other resources refer to this connection. Derive from the source and scope: `hubspot-sales`, `postgres-prod-revenue`, `stripe-billing`.
|
|
150
|
+
- **`spec.name`** — the human-readable label shown in the RevOS UI. Short, sentence case, spaces fine, no underscores. Describe what the connection syncs so a teammate scanning the UI knows without opening it. Patterns that work: `"HubSpot sales pipeline"`, `"Postgres prod → revenue tables"`, `"Stripe billing & subscriptions"`. Avoid reusing the slug here — `hubspot-sales` is a worse `spec.name` than `HubSpot sales pipeline`.
|
|
151
|
+
|
|
152
|
+
If a file already exists at the target path, ask whether to overwrite or pick a different name.
|
|
153
|
+
|
|
154
|
+
Template:
|
|
155
|
+
|
|
156
|
+
```yaml
|
|
157
|
+
apiVersion: revos/v1
|
|
158
|
+
kind: Connection
|
|
159
|
+
metadata:
|
|
160
|
+
name: hubspot-sales
|
|
161
|
+
spec:
|
|
162
|
+
name: HubSpot sales pipeline
|
|
163
|
+
source: { id: <id from `revos sources list`> }
|
|
164
|
+
schedule: { units: 24, timeUnit: hours }
|
|
165
|
+
status: active
|
|
166
|
+
streams:
|
|
167
|
+
- name: customers
|
|
168
|
+
namespace: public # omit if the stream has no namespace
|
|
169
|
+
syncMode: incremental_deduped_history
|
|
170
|
+
cursorField: [updated_at]
|
|
171
|
+
primaryKey: [[id]]
|
|
172
|
+
```
|
|
173
|
+
|
|
174
|
+
Notes:
|
|
175
|
+
|
|
176
|
+
- `source.id` is the server id of the Source (e.g. `src_abc123`). Sources are not IaC — get the id with `revos sources list`.
|
|
177
|
+
- `primaryKey` is a list of lists — each inner list is one PK column path, supporting nested keys. Most cases it's `[[id]]`.
|
|
178
|
+
- `cursorField` is a flat list of path segments. Top-level field is `[updated_at]`; nested would be `[meta, modified_at]`.
|
|
179
|
+
- Omit `cursorField` and `primaryKey` for `full_refresh_overwrite`. Omit only `cursorField` for `full_refresh_overwrite_deduped`. CLI validation rejects missing required fields at apply time, so keep this clean.
|
|
180
|
+
- Don't set `metadata.id` or `spec.prefix` — both are filled in by `revos apply` on first create.
|
|
181
|
+
|
|
182
|
+
**Stream mappers** (`streams[].mappers`) are server-side transformations applied before rows land in BigQuery — hashing PII, renaming columns, dropping fields, filtering rows, encrypting values. Default to a clean sync without them. Load [references/mappers.md](references/mappers.md) when either:
|
|
183
|
+
|
|
184
|
+
- The user mentions masking, hashing, PII, renaming, dropping a column, filtering rows, or encryption.
|
|
185
|
+
- A selected stream's `propertyFields` includes obviously sensitive columns (`email`, `phone`, `ssn`, `ip_address`, full-name pairs, payment fields). In that case load the reference, then proactively suggest a mapper with the field name and rationale, and ask the user to confirm before adding it. Don't quietly insert mappers — masking the wrong field corrupts analysis.
|
|
186
|
+
|
|
187
|
+
## Phase 7: Validate locally
|
|
188
|
+
|
|
189
|
+
```bash
|
|
190
|
+
revos diff
|
|
191
|
+
```
|
|
192
|
+
|
|
193
|
+
The CLI parses the YAML through zod and reports what would change. Look for:
|
|
194
|
+
|
|
195
|
+
- **Parse errors** — schema rejections (missing required fields, invalid sync mode). Fix and re-run.
|
|
196
|
+
- **Drift report** — the new connection should appear as a single `create` entry.
|
|
197
|
+
|
|
198
|
+
If `revos diff` is clean, hand back: "Connection YAML written to `connections/<slug>.yaml` and validated. Run `revos apply` to create it on the server."
|
|
199
|
+
|
|
200
|
+
If validation fails, share the error verbatim and fix before declaring success.
|
|
201
|
+
|
|
202
|
+
---
|
|
203
|
+
|
|
204
|
+
# Common pitfalls
|
|
205
|
+
|
|
206
|
+
- **Picking a sync mode the source doesn't support.** Always intersect your choice with the stream's `syncModes` array from Phase 2. The API validates this server-side; better to catch it locally.
|
|
207
|
+
- **Using a source name where the server id is expected.** `spec.source.id` is the server id (e.g. `src_abc123`), not a display name or a slug. Always pull it from `revos sources list`.
|
|
208
|
+
- **Setting `cursorField` on a `full_refresh_*` mode.** CLI validation tolerates extra fields but it's noise. Omit fields that don't apply to the chosen mode.
|
|
209
|
+
- **Proposing every stream.** Especially for databases. If the user says "all of them" after seeing the proposal, fine — but offer the curated list first.
|
|
210
|
+
- **Inventing fields.** Only use `cursorField` and `primaryKey` values that appear in the discovery response (`defaultCursorField`, `sourceDefinedPrimaryKey`, or `propertyFields`).
|
|
@@ -0,0 +1,152 @@
|
|
|
1
|
+
# Stream mappers
|
|
2
|
+
|
|
3
|
+
Stream mappers are **server-side transformations** applied per stream before rows land in BigQuery. They run inside the sync pipeline, so even a direct BigQuery query sees only the transformed values — no way for downstream analysts to bypass them.
|
|
4
|
+
|
|
5
|
+
Five mapper types are supported. They're all opt-in: a clean sync needs none of them. Add them only when the user asks for masking, renaming, dropping, filtering, or encryption.
|
|
6
|
+
|
|
7
|
+
| Type | Use when |
|
|
8
|
+
| ----------------- | ------------------------------------------------------------------------------------------------------------------- |
|
|
9
|
+
| `hashing` | One-way obfuscation of PII (email, name, IP). Analysts get a joinable but irreversible identifier. |
|
|
10
|
+
| `field-renaming` | Rename a column at ingest. Useful when the source field name leaks sensitivity (`ssn` → `ssn_redacted`). |
|
|
11
|
+
| `field-filtering` | Drop a column entirely. Use when the value should never reach the warehouse (notes, internal flags, free-text PII). |
|
|
12
|
+
| `row-filtering` | Drop entire rows that match a predicate. Use to exclude test/demo data, soft-deleted rows, internal accounts. |
|
|
13
|
+
| `encryption` | Reversible obfuscation. Use when a process downstream of BigQuery needs to recover the plaintext. |
|
|
14
|
+
|
|
15
|
+
Mappers live under `streams[].mappers` as an ordered list. The configuration body always sits under `mapperConfiguration` — that's the schema shape, not a stylistic choice.
|
|
16
|
+
|
|
17
|
+
---
|
|
18
|
+
|
|
19
|
+
## When to proactively suggest a mapper
|
|
20
|
+
|
|
21
|
+
The skill should propose a mapper without being asked when the stream's `propertyFields` includes anything obviously sensitive:
|
|
22
|
+
|
|
23
|
+
- **Hash, don't store plaintext**: `email`, `email_address`, `phone`, `phone_number`, `ip_address`, `ssn`, `tax_id`, `national_id`, `passport`, `dob`, `date_of_birth`, `full_name`, `first_name + last_name` pairs.
|
|
24
|
+
- **Drop entirely**: free-text fields whose names hint at user-entered content (`notes`, `comments`, `internal_notes`, `description`) — unless the user said they need them for analysis.
|
|
25
|
+
- **Encrypt (reversible)**: payment-method fields (`card_number`, `iban`, `account_number`) when the user has indicated a downstream process needs the original values.
|
|
26
|
+
|
|
27
|
+
When suggesting, name the field and the mapper type, explain why, and ask the user to confirm or skip. Never quietly insert a mapper the user didn't approve — masking the wrong field corrupts analysis. Be especially conservative for `encryption`: a missing `key` value makes the YAML pass schema validation but the sync fails at runtime.
|
|
28
|
+
|
|
29
|
+
---
|
|
30
|
+
|
|
31
|
+
## Hashing
|
|
32
|
+
|
|
33
|
+
One-way hash that preserves joinability (the same input always produces the same hash). Default to `SHA-256` — it's the safest choice unless the user has a reason to prefer something else.
|
|
34
|
+
|
|
35
|
+
```yaml
|
|
36
|
+
streams:
|
|
37
|
+
- name: customers
|
|
38
|
+
namespace: public
|
|
39
|
+
syncMode: incremental_deduped_history
|
|
40
|
+
cursorField: [updated_at]
|
|
41
|
+
primaryKey: [[id]]
|
|
42
|
+
mappers:
|
|
43
|
+
- type: hashing
|
|
44
|
+
mapperConfiguration:
|
|
45
|
+
targetField: email
|
|
46
|
+
method: SHA-256
|
|
47
|
+
fieldNameSuffix: _hashed
|
|
48
|
+
```
|
|
49
|
+
|
|
50
|
+
- `targetField`: the source column to hash. Must match a field in `propertyFields`.
|
|
51
|
+
- `method`: one of `MD2`, `MD5`, `SHA-1`, `SHA-224`, `SHA-256`, `SHA-384`, `SHA-512`. MD5/SHA-1 are weak — only use them if the user explicitly asks (e.g. matching an existing hashed dataset).
|
|
52
|
+
- `fieldNameSuffix`: appended to the column name in the destination. So `email` becomes `email_hashed` in BigQuery. Omit if the user wants to replace the column in place — but suffix is the safer default since it makes the transformation visible.
|
|
53
|
+
|
|
54
|
+
---
|
|
55
|
+
|
|
56
|
+
## Field renaming
|
|
57
|
+
|
|
58
|
+
Rename a column on its way into the warehouse. Use when the source name itself is sensitive (column called `ssn` becomes `ssn_redacted` after a separate hashing/filtering step) or when the source uses awkward names you want analysts to see clean.
|
|
59
|
+
|
|
60
|
+
```yaml
|
|
61
|
+
mappers:
|
|
62
|
+
- type: field-renaming
|
|
63
|
+
mapperConfiguration:
|
|
64
|
+
originalFieldName: ssn
|
|
65
|
+
newFieldName: ssn_redacted
|
|
66
|
+
```
|
|
67
|
+
|
|
68
|
+
Rename runs after hashing — chain them when you want both: hash to `ssn_hashed`, then rename if needed.
|
|
69
|
+
|
|
70
|
+
---
|
|
71
|
+
|
|
72
|
+
## Field filtering
|
|
73
|
+
|
|
74
|
+
Drop a column entirely from the destination table. The value never lands in BigQuery.
|
|
75
|
+
|
|
76
|
+
```yaml
|
|
77
|
+
mappers:
|
|
78
|
+
- type: field-filtering
|
|
79
|
+
mapperConfiguration:
|
|
80
|
+
targetField: internal_notes
|
|
81
|
+
```
|
|
82
|
+
|
|
83
|
+
Prefer this over hashing when the value has no analytical use — there's no reason to store a hashed `internal_notes` column when you can just drop it.
|
|
84
|
+
|
|
85
|
+
---
|
|
86
|
+
|
|
87
|
+
## Row filtering
|
|
88
|
+
|
|
89
|
+
Drop rows that match a predicate. Use to exclude test data, soft-deleted records, internal accounts, etc.
|
|
90
|
+
|
|
91
|
+
The `conditions` tree uses nested boolean operators (`AND`, `OR`, `NOT`) and leaf comparisons (`EQUAL`, `NOT_EQUAL`, others). **The predicate keeps rows where it evaluates to true** — so to drop rows, wrap the match condition in `NOT`.
|
|
92
|
+
|
|
93
|
+
```yaml
|
|
94
|
+
mappers:
|
|
95
|
+
# Keep rows where is_test != "true". Test rows get dropped.
|
|
96
|
+
- type: row-filtering
|
|
97
|
+
mapperConfiguration:
|
|
98
|
+
conditions:
|
|
99
|
+
type: NOT
|
|
100
|
+
conditions:
|
|
101
|
+
- type: EQUAL
|
|
102
|
+
fieldName: is_test
|
|
103
|
+
comparisonValue: "true"
|
|
104
|
+
```
|
|
105
|
+
|
|
106
|
+
Gotchas:
|
|
107
|
+
|
|
108
|
+
- **Compare against real sentinel values**, not empty strings or nulls — empty-string comparisons behave unreliably across source connectors.
|
|
109
|
+
- **Boolean fields**: most sources serialize booleans as strings on the wire, so the `comparisonValue` is `"true"` / `"false"`, not `true` / `false`. Confirm against `propertyFields` if unsure.
|
|
110
|
+
- **The keep-vs-drop direction is easy to flip.** Re-read the predicate after writing it: "this filter keeps rows where **_, so it drops rows where _**."
|
|
111
|
+
|
|
112
|
+
---
|
|
113
|
+
|
|
114
|
+
## Encryption
|
|
115
|
+
|
|
116
|
+
Reversible obfuscation. Use only when a process downstream of BigQuery genuinely needs to recover the plaintext — otherwise prefer hashing or filtering.
|
|
117
|
+
|
|
118
|
+
```yaml
|
|
119
|
+
mappers:
|
|
120
|
+
- type: encryption
|
|
121
|
+
mapperConfiguration:
|
|
122
|
+
algorithm: AES
|
|
123
|
+
targetField: card_number
|
|
124
|
+
fieldNameSuffix: _enc
|
|
125
|
+
key: ${env.AES_KEY}
|
|
126
|
+
mode: GCM
|
|
127
|
+
padding: NoPadding
|
|
128
|
+
```
|
|
129
|
+
|
|
130
|
+
- `algorithm`: `RSA` or `AES`.
|
|
131
|
+
- `mode` (AES only): `CBC`, `CFB`, `OFB`, `CTR`, `GCM`, `ECB`. Default to `GCM` — it's authenticated and resistant to tampering. Only fall back to others if the user has a stated reason.
|
|
132
|
+
- `padding` (AES only): `NoPadding` or `PKCS5Padding`. With `GCM`/`CTR`/`CFB`/`OFB`, use `NoPadding`. With `CBC`/`ECB`, use `PKCS5Padding`.
|
|
133
|
+
- `key`: the encryption key. Reference an environment variable with `${env.VAR_NAME}` — never hardcode the key in YAML. Tell the user they need to set the env var in the project's deployment config.
|
|
134
|
+
|
|
135
|
+
---
|
|
136
|
+
|
|
137
|
+
## Ordering
|
|
138
|
+
|
|
139
|
+
Mappers in `streams[].mappers` run in array order. When chaining (e.g. hash then rename), put them in the order you want them applied. Reordering changes the result — hashing after renaming targets the new field name, which is rarely what you want.
|
|
140
|
+
|
|
141
|
+
A safe pattern when both hashing and renaming a sensitive column:
|
|
142
|
+
|
|
143
|
+
```yaml
|
|
144
|
+
mappers:
|
|
145
|
+
- type: hashing
|
|
146
|
+
mapperConfiguration:
|
|
147
|
+
{ targetField: email, method: SHA-256, fieldNameSuffix: _hashed }
|
|
148
|
+
- type: field-filtering
|
|
149
|
+
mapperConfiguration: { targetField: email } # drop the plaintext column
|
|
150
|
+
```
|
|
151
|
+
|
|
152
|
+
This produces only `email_hashed` in BigQuery — the original `email` never lands.
|