vaultkit 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (36) hide show
  1. checksums.yaml +7 -0
  2. data/README.md +961 -0
  3. data/bin/funl +0 -0
  4. data/bin/vkit +30 -0
  5. data/lib/vkit/cli/api/client.rb +115 -0
  6. data/lib/vkit/cli/base_cli.rb +173 -0
  7. data/lib/vkit/cli/commands/approval_command.rb +94 -0
  8. data/lib/vkit/cli/commands/base_command.rb +42 -0
  9. data/lib/vkit/cli/commands/datasource_command.rb +93 -0
  10. data/lib/vkit/cli/commands/fetch_command.rb +48 -0
  11. data/lib/vkit/cli/commands/login_command.rb +136 -0
  12. data/lib/vkit/cli/commands/logout_command.rb +12 -0
  13. data/lib/vkit/cli/commands/policy_bundle_command.rb +62 -0
  14. data/lib/vkit/cli/commands/policy_deploy_command.rb +32 -0
  15. data/lib/vkit/cli/commands/policy_validate_command.rb +31 -0
  16. data/lib/vkit/cli/commands/request_command.rb +102 -0
  17. data/lib/vkit/cli/commands/requests_list_command.rb +47 -0
  18. data/lib/vkit/cli/commands/scan_command.rb +47 -0
  19. data/lib/vkit/cli/commands/whoami_command.rb +14 -0
  20. data/lib/vkit/cli/commands.rb +5 -0
  21. data/lib/vkit/cli/errors.rb +6 -0
  22. data/lib/vkit/cli/policy_bundle_validator.rb +71 -0
  23. data/lib/vkit/cli/requests_cli.rb +23 -0
  24. data/lib/vkit/cli.rb +4 -0
  25. data/lib/vkit/core/auth_client.rb +104 -0
  26. data/lib/vkit/core/credential_resolver.rb +37 -0
  27. data/lib/vkit/core/credential_store.rb +186 -0
  28. data/lib/vkit/core/table_formatter.rb +36 -0
  29. data/lib/vkit/policy/bundle_compiler.rb +154 -0
  30. data/lib/vkit/policy/schema/policy_bundle.schema.json +296 -0
  31. data/lib/vkit/policy/validate_bundle.rb +37 -0
  32. data/lib/vkit/utils/banner.rb +0 -0
  33. data/lib/vkit/utils/config_loader.rb +0 -0
  34. data/lib/vkit/utils/logger.rb +0 -0
  35. data/lib/vkit.rb +3 -0
  36. metadata +94 -0
data/README.md ADDED
@@ -0,0 +1,961 @@
1
+ # VaultKit
2
+
3
+ [![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
4
+ [![Build Status](https://img.shields.io/badge/build-passing-brightgreen.svg)]()
5
+ [![Version](https://img.shields.io/badge/version-0.1.0-orange.svg)]()
6
+ [![Go Version](https://img.shields.io/badge/go-1.21+-00ADD8.svg)](https://golang.org)
7
+ [![Ruby Version](https://img.shields.io/badge/ruby-3.1+-CC342D.svg)](https://www.ruby-lang.org)
8
+
9
+ > A secure, policy-driven control plane for unified data access across heterogeneous data sources.
10
+
11
+ **TL;DR**: VaultKit prevents credential sprawl and enforces data access policies across databases. Write queries in AQL (vendor-neutral JSON), policies decide who sees what, FUNL executes safely. Schema changes require explicit review—no silent data exposure.
12
+
13
+ ---
14
+
15
+ ## 🎯 What is VaultKit?
16
+
17
+ VaultKit provides enterprise-grade governance and security for data access, enabling applications, engineers, and AI agents to query multiple data sources through a unified, policy-controlled interface.
18
+
19
+ VaultKit is a **control plane** that governs how data is accessed across your organization. It centralizes authentication, authorization, policy evaluation, and access control—ensuring that every data request is properly authenticated, authorized, and audited before execution.
20
+
21
+ ### Core Responsibilities
22
+
23
+ - **Authentication & Authorization**: Identity verification and access control with RBAC/ABAC support
24
+ - **Policy Evaluation**: Fine-grained policies based on user attributes, data sensitivity, geographic regions, and time constraints
25
+ - **Data Masking**: Column-level masking rules (full, partial, hash) applied based on user clearance
26
+ - **Connection Management**: Centralized datasource configuration and credential management
27
+ - **Approval Workflows**: Require explicit approval for accessing sensitive datasets
28
+ - **Audit Logging**: Complete audit trail of every data access request
29
+ - **Metadata Catalog**: Automated dataset classification and schema discovery
30
+ - **Session Tokens**: Short-lived, cryptographically-signed tokens for Zero-Trust security
31
+ - **Schema Governance**: Git-backed policy bundles with drift detection for safe schema evolution
32
+
33
+ ---
34
+
35
+ ## ⚡ What is FUNL?
36
+
37
+ FUNL (Functional Universal Query Language) is the **data plane and execution engine** that powers VaultKit's query capabilities. While VaultKit decides *if* and *how* data can be accessed, FUNL handles the actual execution.
38
+
39
+ ### Core Responsibilities
40
+
41
+ - **AQL Translation**: Converts VaultKit's Abstract Query Language (AQL) into native query languages
42
+ - **Multi-Engine Support**: Executes queries against PostgreSQL, MySQL, Snowflake, BigQuery, and more
43
+ - **SQL Masking Engine**: Applies column-level masking at the SQL execution layer
44
+ - **Query Execution**: Handles parameterized queries with injection prevention
45
+ - **Result Sanitization**: Returns masked and filtered results back to VaultKit
46
+ - **JWT Authentication**: Only accepts cryptographically-signed requests from VaultKit
47
+
48
+ FUNL is designed to be lightweight, stateless, and horizontally scalable—making it ideal for high-throughput data access scenarios.
49
+
50
+ ---
51
+
52
+ ## 🏗️ Architecture
53
+
54
+ ```mermaid
55
+ flowchart TB
56
+ subgraph clients["🖥️ Client Layer"]
57
+ app["Applications"]
58
+ cli["CLI Tools"]
59
+ agent["AI Agents"]
60
+ end
61
+
62
+ subgraph vaultkit["🛡️ VaultKit — Control Plane"]
63
+ direction TB
64
+ orch["Request Orchestrator"]
65
+ policy["Policy Engine<br/>(ABAC/RBAC)"]
66
+ auth["Authentication<br/>(JWT/SSO)"]
67
+ registry["Schema Registry"]
68
+ bundle["Active Policy Bundle"]
69
+ conn["Connection Manager"]
70
+ approval["Approval Workflow"]
71
+ audit["Audit Logger"]
72
+ end
73
+
74
+ subgraph funl["⚡ FUNL — Data Plane"]
75
+ direction TB
76
+ translator["AQL Translator"]
77
+ executor["Query Executor"]
78
+ masking["SQL Masking Layer"]
79
+ end
80
+
81
+ subgraph datasources["💾 Data Sources"]
82
+ pg[(PostgreSQL)]
83
+ mysql[(MySQL)]
84
+ snow[(Snowflake)]
85
+ bq[(BigQuery)]
86
+ end
87
+
88
+ subgraph governance["📋 Governance Layer"]
89
+ git["Git Repository<br/>(Policies & Registry)"]
90
+ scan["Schema Scans"]
91
+ cicd["CI/CD Pipeline"]
92
+ end
93
+
94
+ clients -->|AQL Request| orch
95
+ orch --> auth
96
+ orch --> policy
97
+ policy --> bundle
98
+ orch --> registry
99
+ orch --> approval
100
+ orch --> audit
101
+ orch -->|Signed Token + AQL| translator
102
+
103
+ translator --> executor
104
+ executor --> masking
105
+ masking --> datasources
106
+ datasources --> masking
107
+ masking --> executor
108
+ executor --> translator
109
+ translator -->|Masked Results| orch
110
+ orch -->|Response| clients
111
+
112
+ scan -.->|Discovers Schema| datasources
113
+ scan -->|Drift Detection| registry
114
+ git -->|Deploy Bundle| bundle
115
+ cicd -->|Build & Validate| git
116
+
117
+ style vaultkit fill:#e3f2fd
118
+ style funl fill:#fff3e0
119
+ style datasources fill:#f3e5f5
120
+ style governance fill:#f1f8e9
121
+ ```
122
+
123
+ ### Component Breakdown
124
+
125
+ #### VaultKit (Control Plane)
126
+
127
+ - **Request Orchestrator**: Routes and manages the lifecycle of data access requests
128
+ - **Policy Engine**: Evaluates ABAC/RBAC rules with support for sensitivity levels, regions, and time constraints
129
+ - **Policy Priority System**: `deny` > `require_approval` > `mask` > `allow`
130
+ - **Match Rules**: Dataset-based, field sensitivity, categories, specific field names
131
+ - **Context Awareness**: User role, clearance level, region, environment, time windows
132
+ - **Authentication Service**: Handles user identity via JWT, OAuth, or SSO integration
133
+ - **Schema Registry**: Runtime database storing discovered schemas and serving as drift baseline
134
+ - **Active Policy Bundle**: Immutable, Git-backed artifact containing all enforcement rules
135
+ - **Connection Manager**: Manages datasource credentials (supports VaultKit store, HashiCorp Vault, AWS Secrets Manager)
136
+ - **Approval Workflow**: Implements multi-stage approval for sensitive data access
137
+ - **Audit Logger**: Records every request to pluggable sinks (SQLite, PostgreSQL, S3, etc.)
138
+
139
+ #### FUNL (Data Plane)
140
+ - **AQL Translator**: Parses AQL and generates engine-specific SQL with proper escaping
141
+ - **Query Executor**: Maintains connection pools and executes parameterized queries
142
+ - **SQL Masking Layer**: Injects masking functions into SELECT statements based on policy
143
+ - **Engine Plugins**: Extensible architecture for adding new database engines
144
+
145
+ #### Governance Layer
146
+ - **Git Repository**: Source of truth for all policies and dataset registries
147
+ - **Schema Scans**: Observational tooling that discovers database changes
148
+ - **CI/CD Pipeline**: Automated validation, bundling, and deployment of policy changes
149
+
150
+ ---
151
+
152
+ ## 📋 Schema Discovery & Policy Management
153
+
154
+ VaultKit's approach to schema evolution is designed for **security, auditability, and safe change management**.
155
+
156
+ ### The Core Philosophy: Intent vs. Reality
157
+
158
+ VaultKit deliberately separates:
159
+
160
+ - **Policy Bundles** (Intent) — Git-backed, immutable definitions of *what should be allowed*
161
+ - **Schema Scans** (Reality) — Observational discovery of *what actually exists*
162
+
163
+ This separation prevents silent data exposure. When new tables or sensitive columns appear, they require **explicit policy review** before access is granted.
164
+
165
+ ```mermaid
166
+ flowchart LR
167
+ A[Live Database] -->|1. Scan Discovers| B[Schema Changes]
168
+ B -->|2. Drift Detection| C[Schema Registry]
169
+ C -->|3. Human Review| D[Policy Updates in Git]
170
+ D -->|4. CI Validation| E[Build Bundle]
171
+ E -->|5. Deploy| F[Active Bundle]
172
+ F -->|Enforces| G[Access Control]
173
+
174
+ style A fill:#f3e5f5
175
+ style C fill:#fff3e0
176
+ style D fill:#e8f5e9
177
+ style F fill:#e3f2fd
178
+ style G fill:#fce4ec
179
+ ```
180
+
181
+ ### How It Works
182
+
183
+ **Policy Bundles** are versioned artifacts that contain:
184
+ - Dataset registry (expected schema)
185
+ - Datasource definitions
186
+ - Compiled policy rules
187
+ - Bundle metadata (version, checksum, source)
188
+
189
+ **Schema Scans** perform these steps:
190
+ 1. Introspect datasources via FUNL
191
+ 2. Discover tables and columns
192
+ 3. Classify fields (PII, financial, internal)
193
+ 4. Compute diff against runtime registry
194
+ 5. Surface changes for review
195
+
196
+ **Key Safety Principle:**
197
+
198
+ > Scans inform. Bundles enforce. Humans decide.
199
+
200
+ Scans **never** automatically update active policies. This ensures:
201
+ - No silent expansion of policy scope
202
+ - No auto-granting access to new sensitive data
203
+ - Full Git-based audit trail
204
+ - Compliance with SOC2, ISO 27001, GDPR, HIPAA
205
+
206
+ ### Quick Example: Handling Schema Drift
207
+
208
+ ```bash
209
+ # 1. Discover schema changes (safe, read-only)
210
+ vkit scan production_db
211
+
212
+ # Output shows drift:
213
+ # + dataset: customers
214
+ # + field: ssn (string) [PII] ⚠️ NEW - requires policy
215
+ # ~ field: email (text → varchar) [PII] ✓ Already masked
216
+
217
+ # 2. Review and update baseline
218
+ vkit scan production_db --apply
219
+
220
+ # 3. Update policies in Git
221
+ vim policies/customer_data.yaml
222
+
223
+ # 4. Build and deploy new bundle
224
+ vkit policy bundle
225
+ vkit policy deploy
226
+ ```
227
+
228
+ ### CI/CD Integration (Recommended)
229
+
230
+ ```yaml
231
+ name: Schema Drift Check
232
+ on: [schedule, push]
233
+ jobs:
234
+ drift-detection:
235
+ runs-on: ubuntu-latest
236
+ steps:
237
+ - name: Scan all datasources
238
+ run: vkit scan --all --fail-on-drift
239
+
240
+ - name: Create PR if drift detected
241
+ if: failure()
242
+ run: |
243
+ vkit scan --all --diff > drift-report.md
244
+ gh pr create --title "Schema Drift Detected" --body-file drift-report.md
245
+ ```
246
+
247
+ 📘 **[Full Bundling & Scan System Documentation →](./docs/BUNDLING_SCAN_SYSTEM.md)**
248
+
249
+ ---
250
+
251
+ ## 🔄 AQL to Native Query Translation
252
+
253
+ AQL (Access Query Language) is a structured JSON format that describes **what** to fetch, not **how**. This abstraction allows VaultKit to enforce policies consistently across different database engines.
254
+
255
+ ### AQL Structure
256
+
257
+ AQL is a structured JSON specification with the following top-level fields:
258
+
259
+ ```json
260
+ {
261
+ "source_table": "table_name",
262
+ "columns": ["field1", "field2"],
263
+ "joins": [],
264
+ "aggregates": [],
265
+ "filters": [],
266
+ "group_by": [],
267
+ "having": [],
268
+ "order_by": null,
269
+ "limit": 0,
270
+ "offset": 0
271
+ }
272
+ ```
273
+
274
+ ### Translation Examples
275
+
276
+ **Simple Query:**
277
+ ```json
278
+ {
279
+ "source_table": "customers",
280
+ "columns": ["email", "country", "revenue"],
281
+ "filters": [
282
+ { "field": "country", "operator": "eq", "value": "US" },
283
+ { "field": "revenue", "operator": "gt", "value": 10000 }
284
+ ],
285
+ "limit": 100
286
+ }
287
+ ```
288
+
289
+ **FUNL Translation → PostgreSQL:**
290
+ ```sql
291
+ SELECT email, country, revenue
292
+ FROM customers
293
+ WHERE country = $1 AND revenue > $2
294
+ LIMIT 100
295
+ ```
296
+
297
+ **FUNL Translation → MySQL:**
298
+ ```sql
299
+ SELECT `email`, `country`, `revenue`
300
+ FROM `customers`
301
+ WHERE `country` = ? AND `revenue` > ?
302
+ LIMIT 100
303
+ ```
304
+
305
+ **Complex Query with JOINs and Aggregates:**
306
+ ```json
307
+ {
308
+ "source_table": "users",
309
+ "joins": [
310
+ {
311
+ "type": "LEFT",
312
+ "table": "orders",
313
+ "left_field": "users.id",
314
+ "right_field": "orders.user_id"
315
+ }
316
+ ],
317
+ "columns": ["users.email", "users.username"],
318
+ "aggregates": [
319
+ {
320
+ "func": "sum",
321
+ "field": "orders.amount",
322
+ "alias": "total_spent"
323
+ }
324
+ ],
325
+ "group_by": ["users.email", "users.username"],
326
+ "having": [
327
+ {
328
+ "operator": "gt",
329
+ "field": "SUM(orders.amount)",
330
+ "value": 500
331
+ }
332
+ ],
333
+ "order_by": {
334
+ "column": "total_spent",
335
+ "direction": "DESC"
336
+ },
337
+ "limit": 10
338
+ }
339
+ ```
340
+
341
+ **FUNL Translation → PostgreSQL:**
342
+ ```sql
343
+ SELECT users.email, users.username, SUM(orders.amount) AS total_spent
344
+ FROM users
345
+ LEFT JOIN orders ON users.id = orders.user_id
346
+ GROUP BY users.email, users.username
347
+ HAVING SUM(orders.amount) > $1
348
+ ORDER BY total_spent DESC
349
+ LIMIT 10
350
+ ```
351
+
352
+ ### Advanced Features
353
+
354
+ **Complex Filtering with OR Logic:**
355
+ ```json
356
+ {
357
+ "source_table": "users",
358
+ "columns": ["id", "email", "role"],
359
+ "filters": [
360
+ {
361
+ "logic": "OR",
362
+ "conditions": [
363
+ { "field": "users.role", "operator": "eq", "value": "admin" },
364
+ { "field": "users.role", "operator": "eq", "value": "manager" }
365
+ ]
366
+ },
367
+ { "field": "orders.amount", "operator": "gt", "value": 100 }
368
+ ]
369
+ }
370
+ ```
371
+
372
+ **Translation:**
373
+ ```sql
374
+ SELECT id, email, role
375
+ FROM users
376
+ WHERE (users.role = $1 OR users.role = $2) AND orders.amount > $3
377
+ ```
378
+
379
+ **Supported Operators:**
380
+ - Comparison: `eq`, `neq`, `gt`, `lt`, `gte`, `lte`
381
+ - Pattern matching: `like`
382
+ - Set operations: `in`
383
+ - Null checks: `is_null`, `is_not_null`
384
+
385
+ **Supported Aggregations:**
386
+ - `sum`, `count`, `avg`, `min`, `max`
387
+
388
+ **Supported JOINs:**
389
+ - `INNER`, `LEFT`, `RIGHT`, `FULL`
390
+
391
+ ### SQL-Level Masking
392
+
393
+ FUNL's **Masking Dialect System** applies transformations directly in SQL based on VaultKit's policy decisions:
394
+
395
+ | Masking Type | PostgreSQL | MySQL | Snowflake |
396
+ |-------------|------------|-------|-----------|
397
+ | **Full** | `'*****' AS email` | `'*****' AS email` | `'*****' AS email` |
398
+ | **Partial** | `CONCAT(LEFT(email, 3), '****')` | `CONCAT(LEFT(email, 3), '****')` | `CONCAT(LEFT(email, 3), '****')` |
399
+ | **Hash** | `ENCODE(SHA256(email::bytea), 'hex')` | `SHA2(email, 256)` | `SHA2(email, 256)` |
400
+
401
+ ---
402
+
403
+ ## ✨ Key Features
404
+
405
+ ### VaultKit (Control Plane)
406
+
407
+ | Feature | Description |
408
+ |---------|-------------|
409
+ | **AQL Orchestration** | Vendor-neutral query language prevents SQL injection and enables policy enforcement |
410
+ | **Policy Engine** | Attribute-based access control with support for clearance levels, regions, sensitivity tags, and time windows |
411
+ | **Policy Priority System** | Hierarchical decision making: `deny` > `require_approval` > `mask` > `allow` |
412
+ | **Field-Level Policies** | Match rules based on dataset name, field sensitivity, categories (pii, financial, etc.), or specific field names |
413
+ | **Context-Aware Rules** | Policies evaluate requester role, clearance level, region, environment, and time constraints |
414
+ | **Approval Workflows** | Require manager/security approval for accessing PII, financial data, or production environments |
415
+ | **Zero-Trust Sessions** | Short-lived, cryptographically-signed tokens with automatic expiration |
416
+ | **Credential Abstraction** | Never expose database credentials to users—supports multiple secret backends |
417
+ | **Git-Backed Governance** | All policies and registries versioned in Git with full audit trail |
418
+ | **Schema Drift Detection** | Automated discovery of database changes with manual approval workflow |
419
+ | **Comprehensive Auditing** | Every query logged with user identity, timestamp, policy decisions, and results metadata |
420
+ | **CLI & SDK** | Rich command-line tools (`vkit`) and programmatic SDK for integration |
421
+
422
+ ### FUNL (Data Plane)
423
+
424
+ | Feature | Description |
425
+ |---------|-------------|
426
+ | **Multi-Engine Translation** | Supports PostgreSQL, MySQL, Snowflake, BigQuery with identical AQL interface |
427
+ | **SQL-Level Masking** | Masking applied during query execution—no post-processing overhead |
428
+ | **Injection Prevention** | All queries use parameterized execution with proper type binding |
429
+ | **Raw SQL Support** | Supports raw SQL for schema introspection and admin operations (policy-controlled) |
430
+ | **Horizontal Scaling** | Stateless design enables running multiple FUNL instances behind a load balancer |
431
+ | **JWT Verification** | Only executes queries signed by VaultKit's private key |
432
+
433
+ ---
434
+
435
+ ## 🚀 Getting Started
436
+
437
+ ### Prerequisites Checklist
438
+
439
+ Before installing VaultKit, ensure you have:
440
+
441
+ - [ ] **Ruby** 3.1+ installed (`ruby --version`)
442
+ - [ ] **Go** 1.21+ installed (`go version`)
443
+ - [ ] **OpenSSL** for key generation
444
+ - [ ] **Git** for policy management
445
+ - [ ] Access credentials for at least one supported datasource
446
+ - [ ] (Optional) **Docker** for containerized FUNL deployment
447
+
448
+ ### Quick Start (5 Minutes)
449
+
450
+ This streamlined path gets you querying data quickly. For production deployments, see [Detailed Setup](#detailed-setup) below.
451
+
452
+ #### 1. Generate Cryptographic Keys
453
+
454
+ ```bash
455
+ # Generate RSA key pair for token signing
456
+ openssl genpkey -algorithm RSA -out vaultkit_private.pem -pkeyopt rsa_keygen_bits:2048
457
+ openssl rsa -pubout -in vaultkit_private.pem -out vaultkit_public.pem
458
+
459
+ # Set environment variables
460
+ export VKIT_PRIVATE_KEY="$(pwd)/vaultkit_private.pem"
461
+ export VKIT_PUBLIC_KEY="$(pwd)/vaultkit_public.pem"
462
+ export FUNL_PUBLIC_KEY="$(pwd)/vaultkit_public.pem"
463
+ ```
464
+
465
+ ⚠️ **Security Note**: Keep your private key secure. Add `*.pem` to your `.gitignore`.
466
+
467
+ #### 2. Start FUNL
468
+
469
+ ```bash
470
+ # Using Docker (recommended)
471
+ export FUNL_URL="http://localhost:8080"
472
+ docker compose up -d
473
+
474
+ # Verify FUNL is running
475
+ curl http://localhost:8080/health
476
+ ```
477
+
478
+ #### 3. Install VaultKit CLI
479
+
480
+ ```bash
481
+ git clone https://github.com/yourorg/vaultkitcli.git
482
+ cd vaultkitcli
483
+ gem build vaultkitcli.gemspec
484
+ gem install ./vaultkitcli-0.1.0.gem
485
+
486
+ # Verify installation
487
+ vkit --version
488
+ ```
489
+
490
+ #### 4. Initialize & Connect
491
+
492
+ ```bash
493
+ # Authenticate
494
+ vkit login
495
+
496
+ # Register your first datasource
497
+ vkit datasource add \
498
+ --id demo_db \
499
+ --engine postgres \
500
+ --username readonly_user \
501
+ --password $DB_PASSWORD \
502
+ --config '{
503
+ "host": "localhost",
504
+ "port": 5432,
505
+ "database": "analytics"
506
+ }'
507
+
508
+ # Discover schema
509
+ vkit scan demo_db --apply
510
+ ```
511
+
512
+ #### 5. Query Data
513
+
514
+ ```bash
515
+ # Submit AQL query
516
+ vkit request --datasource demo_db --aql '{
517
+ "source_table": "users",
518
+ "columns": ["id", "email", "created_at"],
519
+ "limit": 10
520
+ }'
521
+
522
+ # Fetch results (use grant ID from previous command)
523
+ vkit fetch --grant grant_abc123xyz
524
+ ```
525
+
526
+ 🎉 **You're now querying data through VaultKit!** Continue to [Detailed Setup](#detailed-setup) for production configuration.
527
+
528
+ ---
529
+
530
+ ### Detailed Setup
531
+
532
+ #### Installation Options
533
+
534
+ **Option A: From Source (Recommended for Development)**
535
+
536
+ ```bash
537
+ # Clone and install CLI
538
+ git clone https://github.com/yourorg/vaultkitcli.git
539
+ cd vaultkitcli
540
+ bundle install
541
+ gem build vaultkitcli.gemspec
542
+ gem install ./vaultkitcli-0.1.0.gem
543
+
544
+ # Clone and build FUNL
545
+ git clone https://github.com/yourorg/funl.git
546
+ cd funl
547
+ go build -o funl ./cmd/funl
548
+ ./funl serve --port 8080
549
+ ```
550
+
551
+ **Option B: Using Docker (Recommended for Production)**
552
+
553
+ ```bash
554
+ # docker-compose.yml
555
+ version: '3.8'
556
+ services:
557
+ funl:
558
+ image: vaultkit/funl:0.1.0
559
+ ports:
560
+ - "8080:8080"
561
+ environment:
562
+ - FUNL_PUBLIC_KEY=/keys/vaultkit_public.pem
563
+ volumes:
564
+ - ./vaultkit_public.pem:/keys/vaultkit_public.pem:ro
565
+ healthcheck:
566
+ test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
567
+ interval: 10s
568
+ timeout: 3s
569
+ retries: 3
570
+ ```
571
+
572
+ #### Environment Configuration
573
+
574
+ Create a configuration file for persistent settings:
575
+
576
+ ```bash
577
+ # ~/.vkit/config.yaml
578
+ funl_url: "http://localhost:8080"
579
+ private_key_path: "/path/to/vaultkit_private.pem"
580
+ public_key_path: "/path/to/vaultkit_public.pem"
581
+ default_datasource: "production_db"
582
+ audit_log_path: "/var/log/vaultkit/audit.log"
583
+ ```
584
+
585
+ Add environment variables to your shell profile:
586
+
587
+ ```bash
588
+ # ~/.bashrc or ~/.zshrc
589
+ export VKIT_PRIVATE_KEY="/path/to/vaultkit_private.pem"
590
+ export VKIT_PUBLIC_KEY="/path/to/vaultkit_public.pem"
591
+ export FUNL_PUBLIC_KEY="/path/to/vaultkit_public.pem"
592
+ export FUNL_URL="http://localhost:8080"
593
+ export VKIT_CONFIG_PATH="$HOME/.vkit/config.yaml"
594
+ ```
595
+
596
+ #### Datasource Configuration
597
+
598
+ VaultKit supports multiple credential storage backends:
599
+
600
+ **Built-in Storage (Development):**
601
+ ```bash
602
+ vkit datasource add \
603
+ --id staging_db \
604
+ --engine postgres \
605
+ --username app_user \
606
+ --password $DB_PASSWORD \
607
+ --config '{
608
+ "host": "staging.db.internal",
609
+ "port": 5432,
610
+ "database": "analytics",
611
+ "ssl_mode": "require",
612
+ "connect_timeout": 10
613
+ }'
614
+ ```
615
+
616
+ **HashiCorp Vault (Production):**
617
+ ```bash
618
+ vkit datasource add \
619
+ --id production_db \
620
+ --engine postgres \
621
+ --credential-backend vault \
622
+ --vault-path secret/data/databases/production \
623
+ --config '{
624
+ "host": "prod.db.internal",
625
+ "port": 5432,
626
+ "database": "analytics",
627
+ "ssl_mode": "verify-full"
628
+ }'
629
+ ```
630
+
631
+ **AWS Secrets Manager:**
632
+ ```bash
633
+ vkit datasource add \
634
+ --id cloud_db \
635
+ --engine postgres \
636
+ --credential-backend aws-secrets \
637
+ --secret-arn arn:aws:secretsmanager:us-east-1:123456789:secret:db-creds \
638
+ --config '{
639
+ "host": "rds.amazonaws.com",
640
+ "port": 5432,
641
+ "database": "production"
642
+ }'
643
+ ```
644
+
645
+ #### Policy Setup
646
+
647
+ Initialize your policy repository:
648
+
649
+ ```bash
650
+ # Create policy repository structure
651
+ mkdir -p vaultkit-policies/{policies,registries,datasources}
652
+ cd vaultkit-policies
653
+
654
+ git init
655
+ git add .
656
+ git commit -m "Initialize VaultKit policies"
657
+
658
+ # Link VaultKit to your policy repo
659
+ vkit policy init --repo $(pwd)
660
+ ```
661
+
662
+ Create your first policy (`policies/base_access.yaml`):
663
+
664
+ ```yaml
665
+ id: analyst_basic_access
666
+ description: Basic read access for analysts with PII masking
667
+ priority: 100
668
+
669
+ match:
670
+ datasets:
671
+ - customers
672
+ - orders
673
+
674
+ context:
675
+ requester_role: analyst
676
+ environment: production
677
+
678
+ rules:
679
+ - field_category: pii
680
+ action: mask
681
+ mask_type: partial
682
+ reason: "PII must be masked for analysts"
683
+
684
+ - field_category: financial
685
+ action: require_approval
686
+ approver_role: finance_manager
687
+ ttl: "2h"
688
+ reason: "Financial data requires approval"
689
+
690
+ - default:
691
+ action: allow
692
+ ttl: "8h"
693
+ ```
694
+
695
+ Build and deploy the policy bundle:
696
+
697
+ ```bash
698
+ # Validate policies
699
+ vkit policy validate
700
+
701
+ # Build bundle
702
+ vkit policy bundle --output bundle.tar.gz
703
+
704
+ # Deploy to active control plane
705
+ vkit policy deploy --bundle bundle.tar.gz
706
+ ```
707
+
708
+ ---
709
+
710
+ ## 🎯 Use Cases
711
+
712
+ ### 1. **Secure AI Agent Access**
713
+ Enable LLM-powered agents to query production databases without exposing credentials. VaultKit enforces policies that restrict which tables and columns AI agents can access, with automatic masking of sensitive fields.
714
+
715
+ **Problem:**
716
+ - AI agents need data context for decision-making
717
+ - Direct database access creates security risks
718
+ - Credentials in AI systems can leak
719
+ - Uncontrolled queries can expose sensitive data
720
+
721
+ **VaultKit Solution:**
722
+
723
+ ```yaml
724
+ # policies/ai_agent_restrictions.yaml
725
+ id: ai_agent_restrictions
726
+ match:
727
+ fields:
728
+ category: pii
729
+
730
+ context:
731
+ requester_role: ai_agent
732
+ environment: production
733
+
734
+ action:
735
+ mask: true
736
+ mask_type: hash
737
+ reason: "PII must be hashed for AI agents"
738
+ ttl: "15m"
739
+ ```
740
+
741
+ **How it works:**
742
+ 1. AI agents receive short-lived session tokens (15 minutes)
743
+ 2. All PII fields automatically hashed before results return
744
+ 3. Agents can analyze patterns without accessing raw sensitive data
745
+ 4. Every query logged with AI agent identity
746
+
747
+ **CLI Usage:**
748
+ ```bash
749
+ # AI agent requests data
750
+ vkit request \
751
+ --datasource production_db \
752
+ --role ai_agent \
753
+ --aql '{
754
+ "source_table": "customer_behavior",
755
+ "columns": ["user_id", "email", "purchase_amount"],
756
+ "filters": [{"field": "purchase_amount", "operator": "gt", "value": 1000}]
757
+ }'
758
+
759
+ # Result: email field is hashed
760
+ # {
761
+ # "user_id": 12345,
762
+ # "email": "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855",
763
+ # "purchase_amount": 1250.00
764
+ # }
765
+ ```
766
+
767
+ ---
768
+
769
+ ### 2. **Cross-Region Compliance (GDPR, CCPA, HIPAA)**
770
+ Automatically mask or deny access to sensitive fields based on user location and data residency requirements.
771
+
772
+ **Problem:**
773
+ - GDPR prohibits transferring EU citizen data outside EU
774
+ - CCPA requires disclosure of data access by California residents
775
+ - HIPAA restricts PHI access based on business need
776
+ - Manual compliance is error-prone and doesn't scale
777
+
778
+ **VaultKit Solution:**
779
+
780
+ ```yaml
781
+ # policies/gdpr_protection.yaml
782
+ id: gdpr_cross_region_deny
783
+ match:
784
+ dataset: eu_customers
785
+ fields:
786
+ category: pii
787
+
788
+ context:
789
+ requester_region: US
790
+ dataset_region: EU
791
+
792
+ action:
793
+ deny: true
794
+ reason: "Cross-region PII access forbidden due to GDPR Article 44"
795
+ audit_flag: compliance_violation
796
+
797
+ ---
798
+ id: ccpa_disclosure_logging
799
+ match:
800
+ dataset: california_residents
801
+
802
+ context:
803
+ requester_region: ANY
804
+
805
+ action:
806
+ allow: true
807
+ ttl: "1h"
808
+ audit_metadata:
809
+ ccpa_disclosure: true
810
+ data_subject_rights: "User has right to know about this access"
811
+ ```
812
+
813
+ **How it works:**
814
+ - US-based analysts querying EU customer data are automatically denied
815
+ - Policy engine evaluates both requester and dataset regions
816
+ - Audit log captures denied attempts with compliance flags
817
+ - CCPA-flagged queries generate disclosure reports
818
+
819
+ **Multi-Region Access Pattern:**
820
+ ```bash
821
+ # US analyst attempts to query EU data
822
+ vkit request --datasource eu_customers_db --aql '{...}'
823
+
824
+ # Response:
825
+ # ❌ Access Denied
826
+ # Reason: Cross-region PII access forbidden due to GDPR Article 44
827
+ # Policy: gdpr_cross_region_deny
828
+ # Audit ID: audit_20240115_abc123
829
+ ```
830
+
831
+ ---
832
+
833
+ ### 3. **Just-In-Time Access (Break-Glass)**
834
+ Developers can request temporary elevated access to production data with automatic approval workflows and audit trails.
835
+
836
+ **Problem:**
837
+ - Engineers need emergency production access for incident resolution
838
+ - Standing privileges violate principle of least privilege
839
+ - Manual approval processes slow incident response
840
+ - Audit trails are incomplete or manual
841
+
842
+ **VaultKit Solution:**
843
+
844
+ ```yaml
845
+ # policies/financial_requires_approval.yaml
846
+ id: financial_break_glass
847
+ match:
848
+ fields:
849
+ category: financial
850
+
851
+ context:
852
+ environment: production
853
+ requester_role: engineer
854
+
855
+ action:
856
+ require_approval: true
857
+ approver_role: finance_manager
858
+ approval_metadata_required:
859
+ - incident_ticket
860
+ - business_justification
861
+ ttl: "1h"
862
+ reason: "Financial data access requires approval"
863
+ auto_revoke: true
864
+ ```
865
+
866
+ **CLI Usage:**
867
+ ```bash
868
+ # Engineer requests emergency access
869
+ vkit request \
870
+ --datasource production_db \
871
+ --approval-required \
872
+ --reason "Investigating payment processing failure - Ticket INC-5432" \
873
+ --metadata '{"incident_ticket": "INC-5432", "business_justification": "Payment processor reporting transaction mismatch"}' \
874
+ --duration "2h" \
875
+ --aql '{
876
+ "source_table": "transactions",
877
+ "columns": ["transaction_id", "amount", "status"],
878
+ "filters": [{"field": "status", "operator": "eq", "value": "failed"}]
879
+ }'
880
+
881
+ # Response:
882
+ # ⏳ Approval Required
883
+ # Request ID: req_20240115_xyz789
884
+ # Status: PENDING
885
+ # Approver: finance_manager
886
+ # Expires: 2024-01-15 18:30 UTC
887
+ ```
888
+
889
+ **Approval Workflow:**
890
+ ```bash
891
+ # Finance manager reviews request
892
+ vkit approval list --pending
893
+
894
+ # Approve with notes
895
+ vkit approval grant \
896
+ --request-id req_20240115_xyz789 \
897
+ --notes "Approved for incident resolution. Access limited to failed transactions only."
898
+
899
+ # Engineer notified and can fetch data
900
+ vkit fetch --grant grant_approved_abc123
901
+ ```
902
+
903
+ **Audit Trail Output:**
904
+ ```json
905
+ {
906
+ "request_id": "req_20240115_xyz789",
907
+ "requester": "engineer@company.com",
908
+ "requester_role": "engineer",
909
+ "requested_at": "2024-01-15T16:00:00Z",
910
+ "approved_by": "finance.manager@company.com",
911
+ "approved_at": "2024-01-15T16:05:32Z",
912
+ "approval_reason": "Approved for incident resolution",
913
+ "access_granted": "2024-01-15T16:05:32Z",
914
+ "access_expires": "2024-01-15T18:05:32Z",
915
+ "queries_executed": 3,
916
+ "rows_accessed": 147,
917
+ "incident_ticket": "INC-5432",
918
+ "auto_revoked": true
919
+ }
920
+ ```
921
+
922
+ ---
923
+
924
+ ### 4. **Time-Restricted Access Windows**
925
+ Enforce access controls based on business hours, maintenance windows, or compliance requirements.
926
+
927
+ **Problem:**
928
+ - Audit logs should only be accessible during business hours
929
+ - Production data access outside work hours indicates potential misuse
930
+ - Compliance requires time-based controls for certain datasets
931
+
932
+ **VaultKit Solution:**
933
+
934
+ ```yaml
935
+ # policies/time_restricted_access.yaml
936
+ id: business_hours_only
937
+ match:
938
+ dataset: audit_logs
939
+ environment: production
940
+
941
+ context:
942
+ time:
943
+ start: "08:00"
944
+ end: "18:00"
945
+ timezone: "America/Toronto"
946
+ days: ["monday", "tuesday", "wednesday", "thursday", "friday"]
947
+
948
+ action:
949
+ allow: true
950
+ reason: "Access allowed during business hours"
951
+ ttl: "1h"
952
+
953
+ ---
954
+ id: after_hours_deny
955
+ match:
956
+ dataset: audit_logs
957
+ environment: production
958
+
959
+ context:
960
+ time:
961
+ outside_window