@revos/cli 0.1.0 → 0.1.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/bin/revos.js +1 -1
- package/dist/adapters/oclif/commands/auth/login.mjs +2 -2
- package/dist/adapters/oclif/commands/auth/logout.mjs +2 -2
- package/dist/adapters/oclif/commands/auth/status.mjs +2 -2
- package/dist/adapters/oclif/commands/init.mjs +5 -4
- package/dist/adapters/oclif/commands/org/current.mjs +3 -3
- package/dist/adapters/oclif/commands/org/list.mjs +3 -3
- package/dist/adapters/oclif/commands/org/switch.mjs +3 -3
- package/dist/adapters/oclif/commands/overlays/diff.mjs +3 -3
- package/dist/adapters/oclif/commands/overlays/pull.mjs +3 -3
- package/dist/adapters/oclif/commands/overlays/push.mjs +3 -3
- package/dist/adapters/oclif/commands/overlays/status.mjs +3 -3
- package/dist/{base.command-BGM225ik.mjs → base.command-DlYMawJ6.mjs} +1 -1
- package/dist/{core-Bif-kxlo.mjs → core-Dq15hO6f.mjs} +70 -208
- package/dist/{index-C0e8MXGP.d.mts → index-DuqD2b_7.d.mts} +2 -8
- package/dist/index.d.mts +1 -1
- package/dist/index.mjs +1 -1
- package/dist/templates/.devcontainer/Dockerfile +14 -0
- package/dist/templates/.devcontainer/devcontainer.json +54 -0
- package/dist/templates/.devcontainer/setup.sh +32 -0
- package/dist/templates/AGENTS.md +2 -3
- package/dist/templates/CLAUDE.md +0 -16
- package/dist/templates/README.md +23 -0
- package/dist/templates/dbt/dbt_project.yml +22 -0
- package/dist/templates/index.ts +4 -0
- package/dist/templates/skills/create-semantic-model/SKILL.md +1611 -0
- package/dist/templates/skills/explore-lakehouse/SKILL.md +131 -0
- package/package.json +1 -3
|
@@ -0,0 +1,131 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: explore-lakehouse
|
|
3
|
+
description: Explore the RevOS BigQuery lakehouse — list datasets, list tables, inspect table schemas, preview sample rows, assess data layer (bronze/silver/gold), and check data completeness/null rates. Use when asked to explore the lakehouse, understand available raw data, inspect table structure, assess data quality, or check what's been built in dbt.
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# Explore Lakehouse
|
|
7
|
+
|
|
8
|
+
Use `bq` CLI to explore the BigQuery lakehouse for this project.
|
|
9
|
+
|
|
10
|
+
## Environment
|
|
11
|
+
|
|
12
|
+
Resolve connection details from env vars before running any command:
|
|
13
|
+
|
|
14
|
+
```bash
|
|
15
|
+
echo "Project: $GOOGLE_CLOUD_PROJECT"
|
|
16
|
+
echo "Dataset: $REVOS_BQ_DATASET"
|
|
17
|
+
```
|
|
18
|
+
|
|
19
|
+
- `$GOOGLE_CLOUD_PROJECT` — BQ project ID
|
|
20
|
+
- `$REVOS_BQ_DATASET` — default dataset (may be overridden by user)
|
|
21
|
+
- `INFORMATION_SCHEMA` queries: omit `--location` flag — use plain `bq query --nouse_legacy_sql`
|
|
22
|
+
|
|
23
|
+
## Commands
|
|
24
|
+
|
|
25
|
+
List tables in the org's dataset:
|
|
26
|
+
|
|
27
|
+
```bash
|
|
28
|
+
bq ls $REVOS_BQ_DATASET
|
|
29
|
+
```
|
|
30
|
+
|
|
31
|
+
List all datasets in the project (only if the user explicitly asks):
|
|
32
|
+
|
|
33
|
+
```bash
|
|
34
|
+
bq ls --project_id=$GOOGLE_CLOUD_PROJECT
|
|
35
|
+
```
|
|
36
|
+
|
|
37
|
+
Inspect a table schema (filter out internal columns):
|
|
38
|
+
|
|
39
|
+
```bash
|
|
40
|
+
bq show --schema --format=prettyjson $REVOS_BQ_DATASET.<table> | python3 -c "
|
|
41
|
+
import json, sys
|
|
42
|
+
cols = json.load(sys.stdin)
|
|
43
|
+
names = [c['name'] for c in cols if not c['name'].startswith('_airbyte')]
|
|
44
|
+
print('\n'.join(names))
|
|
45
|
+
"
|
|
46
|
+
```
|
|
47
|
+
|
|
48
|
+
Preview sample rows:
|
|
49
|
+
|
|
50
|
+
```bash
|
|
51
|
+
bq head -n 5 $REVOS_BQ_DATASET.<table>
|
|
52
|
+
```
|
|
53
|
+
|
|
54
|
+
Get row counts for a list of tables:
|
|
55
|
+
|
|
56
|
+
```bash
|
|
57
|
+
for table in table1 table2 table3; do
|
|
58
|
+
echo -n "$table: "
|
|
59
|
+
bq query --nouse_legacy_sql --format=csv \
|
|
60
|
+
"SELECT COUNT(*) FROM \`$GOOGLE_CLOUD_PROJECT.$REVOS_BQ_DATASET.$table\`" 2>/dev/null | tail -1
|
|
61
|
+
done
|
|
62
|
+
```
|
|
63
|
+
|
|
64
|
+
Check null rates on a set of columns:
|
|
65
|
+
|
|
66
|
+
```bash
|
|
67
|
+
bq query --nouse_legacy_sql "
|
|
68
|
+
SELECT
|
|
69
|
+
COUNTIF(col1 IS NULL) AS col1_null,
|
|
70
|
+
COUNTIF(col2 IS NULL) AS col2_null,
|
|
71
|
+
COUNT(*) AS total
|
|
72
|
+
FROM \`$GOOGLE_CLOUD_PROJECT.$REVOS_BQ_DATASET.<table>\`
|
|
73
|
+
"
|
|
74
|
+
```
|
|
75
|
+
|
|
76
|
+
## Workflows
|
|
77
|
+
|
|
78
|
+
### "What's in my database?" / general overview
|
|
79
|
+
|
|
80
|
+
1. List tables in the org's dataset: `bq ls $REVOS_BQ_DATASET`
|
|
81
|
+
2. Infer the data source and domain from table name prefixes (e.g. `salesforce_*`, `stripe_*`, `hubspot_*`)
|
|
82
|
+
3. Group tables by source/domain
|
|
83
|
+
4. Return: sources found, table count per source, table types (TABLE/VIEW), one-line description per group
|
|
84
|
+
|
|
85
|
+
### "What layer is this data?" / bronze–silver–gold assessment
|
|
86
|
+
|
|
87
|
+
1. Check dbt model folders: `find dbt/models -type f | sort`
|
|
88
|
+
2. If folders contain only `.gitkeep` → that layer hasn't been built yet
|
|
89
|
+
3. Assess the tables themselves:
|
|
90
|
+
- **Bronze indicators:** raw source-prefixed names, many flat columns with source-system naming (e.g. `properties_*`, `fields_*`), no aggregations, no joins visible in schema
|
|
91
|
+
- **Silver indicators:** cleaned column names, deduplicated, conformed types, `_id` foreign keys
|
|
92
|
+
- **Gold indicators:** aggregated metrics, wide fact tables, business-named columns (`arr`, `churn_rate`, `ltv`)
|
|
93
|
+
4. Report which layers exist and which are missing
|
|
94
|
+
|
|
95
|
+
### "Is data complete?" / data quality check
|
|
96
|
+
|
|
97
|
+
1. Get all tables: `bq ls <dataset>`
|
|
98
|
+
2. For each table, fetch its schema to discover actual column names
|
|
99
|
+
3. Identify the business-critical columns from the schema — look for:
|
|
100
|
+
- **Identity/key columns:** anything named `id`, `*_id`, `email`, `name`
|
|
101
|
+
- **Date columns:** `created_at`, `*_date`, `*_at`
|
|
102
|
+
- **Relationship columns:** foreign keys linking to other objects
|
|
103
|
+
- **Core metric columns:** amounts, statuses, stages, owner assignments
|
|
104
|
+
4. Run a single `COUNTIF(col IS NULL)` query per table covering those columns
|
|
105
|
+
5. Where a source uses an `archived` / `is_deleted` flag, filter it out: `WHERE archived = false OR archived IS NULL`
|
|
106
|
+
6. Present results per table:
|
|
107
|
+
|
|
108
|
+
| Field | Nulls | % Missing | Status |
|
|
109
|
+
| ----- | ----- | --------- | ------------ |
|
|
110
|
+
| ... | ... | ...% | ✅ / ⚠️ / ❌ |
|
|
111
|
+
|
|
112
|
+
Status thresholds: ✅ < 5% · ⚠️ 5–50% · ❌ > 50%
|
|
113
|
+
|
|
114
|
+
7. Summarise findings: which tables are well-populated, which have critical gaps, and what that means for downstream use
|
|
115
|
+
|
|
116
|
+
### "What's in a specific table?"
|
|
117
|
+
|
|
118
|
+
1. `bq show --schema` — get full column list (omit `_airbyte_*` columns)
|
|
119
|
+
2. `SELECT COUNT(*)` — row count
|
|
120
|
+
3. `bq head -n 5` — sample rows
|
|
121
|
+
4. Identify and highlight the most important business columns from the schema
|
|
122
|
+
|
|
123
|
+
## Output rules
|
|
124
|
+
|
|
125
|
+
- Never mention Airbyte — it is an internal ETL mechanism invisible to users
|
|
126
|
+
- Do not reference `_airbyte_*` columns by name or explain their origin; omit them from schema summaries
|
|
127
|
+
- Describe data freshness neutrally: "updated daily" not "partitioned on `_airbyte_extracted_at`"
|
|
128
|
+
- Do not use phrases like "via Airbyte", "Airbyte's flat-column pattern", or "Airbyte artifact"
|
|
129
|
+
- Always discover table structure dynamically from the schema — never assume column names from a previous session
|
|
130
|
+
- Group output by integration source when listing many tables
|
|
131
|
+
- Note small row counts (< 100 rows) as a possible indicator of sandbox or test data
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "@revos/cli",
|
|
3
|
-
"version": "0.1.
|
|
3
|
+
"version": "0.1.1",
|
|
4
4
|
"description": "RevOS CLI for managing overlays and other resources",
|
|
5
5
|
"type": "module",
|
|
6
6
|
"main": "dist/index.js",
|
|
@@ -37,14 +37,12 @@
|
|
|
37
37
|
"@oclif/core": "^4.2.10",
|
|
38
38
|
"@oclif/table": "^0.5.4",
|
|
39
39
|
"chalk": "^4.1.2",
|
|
40
|
-
"express": "^4.21.1",
|
|
41
40
|
"open": "^10.1.0",
|
|
42
41
|
"@revos/api-client": "0.1.0"
|
|
43
42
|
},
|
|
44
43
|
"devDependencies": {
|
|
45
44
|
"@swc/core": "^1.7.26",
|
|
46
45
|
"@swc/jest": "^0.2.36",
|
|
47
|
-
"@types/express": "^4.17.21",
|
|
48
46
|
"@types/jest": "^29.5.13",
|
|
49
47
|
"@types/node": "^22.5.4",
|
|
50
48
|
"jest": "^29.7.0",
|