PyPI - queryguard-cli - Versions diffs - 0.1.0__tar.gz - Mend

queryguard-cli 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (9) hide show

queryguard_cli-0.1.0/LICENSE +21 -0
queryguard_cli-0.1.0/PKG-INFO +124 -0
queryguard_cli-0.1.0/README.md +99 -0
queryguard_cli-0.1.0/pyproject.toml +32 -0
queryguard_cli-0.1.0/queryguard/__init__.py +0 -0
queryguard_cli-0.1.0/queryguard/analysis.py +67 -0
queryguard_cli-0.1.0/queryguard/client.py +121 -0
queryguard_cli-0.1.0/queryguard/console_utils.py +20 -0
queryguard_cli-0.1.0/queryguard/main.py +127 -0

queryguard_cli-0.1.0/LICENSE ADDED Viewed

@@ -0,0 +1,21 @@
+MIT License
+Copyright (c) 2026 Mark de Haan
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

queryguard_cli-0.1.0/PKG-INFO ADDED Viewed

@@ -0,0 +1,124 @@
+Metadata-Version: 2.4
+Name: queryguard-cli
+Version: 0.1.0
+Summary: A BigQuery Cost Analysis CLI Tool
+License-File: LICENSE
+Keywords: bigquery,gcp,cost-optimization,finops,cli
+Author: Mark de Haan
+Author-email: markdehaan90@gmail.com
+Requires-Python: >=3.12,<4.0
+Classifier: Intended Audience :: Developers
+Classifier: Intended Audience :: System Administrators
+Classifier: Operating System :: OS Independent
+Classifier: Programming Language :: Python :: 3
+Classifier: Programming Language :: Python :: 3.12
+Classifier: Programming Language :: Python :: 3.13
+Classifier: Programming Language :: Python :: 3.14
+Classifier: Topic :: Database
+Classifier: Topic :: Utilities
+Requires-Dist: db-dtypes (>=1.2.0,<2.0.0)
+Requires-Dist: google-cloud-bigquery (>=3.40.0,<4.0.0)
+Requires-Dist: rich (>=14.2.0,<15.0.0)
+Requires-Dist: typer (>=0.21.1,<0.22.0)
+Project-URL: Repository, https://github.com/mark-de-haan/query-guard-cli
+Description-Content-Type: text/markdown
+# QueryGuard CLI 🛡️
+**The Forensic Auditor for your BigQuery Bill.**
+[![Python 3.12+](https://img.shields.io/badge/python-3.12+-blue.svg)](https://www.python.org/downloads/)
+[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
+[![Code Style: Black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
+**QueryGuard** (`bqg`) is a CLI tool that hunts down expensive BigQuery queries across your entire Google Cloud organization. It connects to the `INFORMATION_SCHEMA`, calculates exact costs based on regional pricing, and flags high-risk patterns like `SELECT *` or missing `LIMIT` clauses.
+Stop guessing who spent the budget. **Know.**
+---
+## ⚡ Features
+* **🌍 Global Auto-Discovery**: Automatically scans your project to find active regions and queries them in parallel. No more guessing if data is in `us-central1` or `europe-west3`.
+* **💸 Forensic Cost Analysis**: Calculates costs based on the **exact datacenter pricing** (e.g., pricing Zurich queries at $8.75/TiB vs. US queries at $6.25/TiB).
+* **🚩 Risk Detection**: Instantly flags bad habits:
+    * `SELECT *` usage
+    * Queries without `LIMIT`
+    * Heavy scans (>100 GB)
+    * Wrapper scripts vs. actual compute
+* **🤖 Bot Filtering**: Use `--humans-only` to filter out service accounts and Looker bots, focusing strictly on manual engineering errors.
+* **🚀 High Performance**: Uses multi-threaded execution to audit dozens of regions in seconds.
+---
+## 📦 Installation
+### Option 1: Using Pip
+```bash
+pip install queryguard-cli
+```
+### Option 2: From Source (Poetry)
+```bash
+# Clone the repo
+git clone git@github.com:mark-de-haan/query-guard-cli.git
+# Navigate
+cd queryguard-cli
+# Install locally
+poetry install
+```
+## 🚀 Quick Start
+Ensure you are authenticated with Google Cloud:
+```bash
+gcloud auth application-default login
+```
+Run a forensic scan on your primary project for the last 7 days:
+```bash
+bqg scan --project my-gcp-project
+```
+#### Global scanning
+Audit every active region globally to find hidden costs:
+```bash
+bqg scan --project my-gcp-project --global
+```
+## 🛠 Usage Guide
+The `scan` Command
+| Flag | Short | Description|
+| -----|-------|------------|
+| --project | -p | Required. The GCP Project ID to audit. |
+| --global | -g | Auto-discover active regions and scan them all in parallel. |
+| --region | -r | Scan a specific region (e.g., europe-west1). Ignored if --global is set. |
+| --days | -d | Lookback window in days (Default: 7). |
+| --humans-only | | Hides service accounts (e.g., gserviceaccount, monitoring) to find manual errors. |
+| --limit | -l | Number of expensive queries to display (Default: 10). |
+#### Examples
+Find who is running expensive queries manually:
+```bash
+bqg scan -p my-data-warehouse --global --humans-only --days 30
+```
+Audit a specific region for a deep dive
+```bash
+bqg scan -p my-data-warehouse -r europe-west3
+```
+## 🤝 Contributing
+Contributions are welcome! Please check the issues page.
+1. Fork the Project
+2. Create your Feature Branch (git checkout -b feat/AmazingFeature)
+3. Commit your Changes (git commit -m 'Add some AmazingFeature')
+4. Push to the Branch (git push origin feat/AmazingFeature)
+5. Open a Pull Request
+## 📄 License
+Distributed under the MIT License. See LICENSE for more information.

queryguard_cli-0.1.0/README.md ADDED Viewed

@@ -0,0 +1,99 @@
+# QueryGuard CLI 🛡️
+**The Forensic Auditor for your BigQuery Bill.**
+[![Python 3.12+](https://img.shields.io/badge/python-3.12+-blue.svg)](https://www.python.org/downloads/)
+[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
+[![Code Style: Black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
+**QueryGuard** (`bqg`) is a CLI tool that hunts down expensive BigQuery queries across your entire Google Cloud organization. It connects to the `INFORMATION_SCHEMA`, calculates exact costs based on regional pricing, and flags high-risk patterns like `SELECT *` or missing `LIMIT` clauses.
+Stop guessing who spent the budget. **Know.**
+---
+## ⚡ Features
+* **🌍 Global Auto-Discovery**: Automatically scans your project to find active regions and queries them in parallel. No more guessing if data is in `us-central1` or `europe-west3`.
+* **💸 Forensic Cost Analysis**: Calculates costs based on the **exact datacenter pricing** (e.g., pricing Zurich queries at $8.75/TiB vs. US queries at $6.25/TiB).
+* **🚩 Risk Detection**: Instantly flags bad habits:
+    * `SELECT *` usage
+    * Queries without `LIMIT`
+    * Heavy scans (>100 GB)
+    * Wrapper scripts vs. actual compute
+* **🤖 Bot Filtering**: Use `--humans-only` to filter out service accounts and Looker bots, focusing strictly on manual engineering errors.
+* **🚀 High Performance**: Uses multi-threaded execution to audit dozens of regions in seconds.
+---
+## 📦 Installation
+### Option 1: Using Pip
+```bash
+pip install queryguard-cli
+```
+### Option 2: From Source (Poetry)
+```bash
+# Clone the repo
+git clone git@github.com:mark-de-haan/query-guard-cli.git
+# Navigate
+cd queryguard-cli
+# Install locally
+poetry install
+```
+## 🚀 Quick Start
+Ensure you are authenticated with Google Cloud:
+```bash
+gcloud auth application-default login
+```
+Run a forensic scan on your primary project for the last 7 days:
+```bash
+bqg scan --project my-gcp-project
+```
+#### Global scanning
+Audit every active region globally to find hidden costs:
+```bash
+bqg scan --project my-gcp-project --global
+```
+## 🛠 Usage Guide
+The `scan` Command
+| Flag | Short | Description|
+| -----|-------|------------|
+| --project | -p | Required. The GCP Project ID to audit. |
+| --global | -g | Auto-discover active regions and scan them all in parallel. |
+| --region | -r | Scan a specific region (e.g., europe-west1). Ignored if --global is set. |
+| --days | -d | Lookback window in days (Default: 7). |
+| --humans-only | | Hides service accounts (e.g., gserviceaccount, monitoring) to find manual errors. |
+| --limit | -l | Number of expensive queries to display (Default: 10). |
+#### Examples
+Find who is running expensive queries manually:
+```bash
+bqg scan -p my-data-warehouse --global --humans-only --days 30
+```
+Audit a specific region for a deep dive
+```bash
+bqg scan -p my-data-warehouse -r europe-west3
+```
+## 🤝 Contributing
+Contributions are welcome! Please check the issues page.
+1. Fork the Project
+2. Create your Feature Branch (git checkout -b feat/AmazingFeature)
+3. Commit your Changes (git commit -m 'Add some AmazingFeature')
+4. Push to the Branch (git push origin feat/AmazingFeature)
+5. Open a Pull Request
+## 📄 License
+Distributed under the MIT License. See LICENSE for more information.

queryguard_cli-0.1.0/pyproject.toml ADDED Viewed

@@ -0,0 +1,32 @@
+[tool.poetry]
+name = "queryguard-cli"
+version = "0.1.0"
+description = "A BigQuery Cost Analysis CLI Tool"
+authors = ["Mark de Haan <markdehaan90@gmail.com>"]
+readme = "README.md"
+packages = [{include = "queryguard"}]
+repository = "https://github.com/mark-de-haan/query-guard-cli"
+keywords = ["bigquery", "gcp", "cost-optimization", "finops", "cli"]
+classifiers = [
+    "Topic :: Database",
+    "Topic :: Utilities",
+    "Intended Audience :: Developers",
+    "Intended Audience :: System Administrators",
+    "Programming Language :: Python :: 3",
+    "Programming Language :: Python :: 3.12",
+    "Operating System :: OS Independent",
+]
+[tool.poetry.dependencies]
+python = "^3.12"
+typer = "^0.21.1"
+rich = "^14.2.0"
+google-cloud-bigquery = "^3.40.0"
+db-dtypes = "^1.2.0"
+[tool.poetry.scripts]
+bqg = "queryguard.main:app"
+[build-system]
+requires = ["poetry-core"]
+build-backend = "poetry.core.masonry.api"

queryguard_cli-0.1.0/queryguard/__init__.py ADDED Viewed

File without changes

queryguard_cli-0.1.0/queryguard/analysis.py ADDED Viewed

@@ -0,0 +1,67 @@
+import re
+# Pricing per TiB (approximate on-demand rates)
+REGION_PRICING_TABLE: dict[str, float] = {
+    # --- The "Standard" Tier ($6.25) ---
+    "us": 6.25,
+    "us-central1": 6.25,
+    "us-east1": 6.25,
+    "us-east4": 6.25,
+    "us-west1": 6.25,
+    "eu": 6.25,
+    "europe-west1": 6.25,  # Belgium
+    "europe-north1": 6.25,  # Finland
+    # --- The "High Energy" Tier (~$7.50 - $8.50) ---
+    "europe-west4": 7.50,    # Netherlands
+    "europe-west2": 7.82,    # London
+    "europe-west3": 8.13,    # Frankfurt
+    "europe-west6": 8.75,    # Zurich
+    "us-west2": 8.44,        # Los Angeles
+    "us-west3": 8.44,        # Salt Lake City
+    # --- The "Premium" Tier ($9.00+) ---
+    "southamerica-east1": 11.25,  # Sao Paulo
+    "asia-northeast1": 7.40,     # Tokyo
+    "asia-southeast1": 7.80,     # Singapore
+    "me-central2": 10.00,        # Dammam
+}
+DEFAULT_PRICE: float = 6.25
+class ForensicAuditor:
+    @staticmethod
+    def get_price_per_tib(region: str) -> float:
+        """Returns the price per TiB for the given region, defaulting to $6.25."""
+        return REGION_PRICING_TABLE.get(region.lower(), DEFAULT_PRICE)
+    @staticmethod
+    def calculate_cost(bytes_billed: int, region: str) -> float:
+        """Calculates cost based on region-specific pricing."""
+        if not bytes_billed:
+            return 0.0
+        tebibytes: float = bytes_billed / (1024**4)
+        price = ForensicAuditor.get_price_per_tib(region)
+        return tebibytes * price
+    @staticmethod
+    def analyze_query(sql: str, bytes_billed: int) -> list[str]:
+        """Returns a list of detected issues."""
+        risks: list[str] = []
+        # 1. SELECT *
+        if re.search(r'SELECT\s+\*\s+', sql, re.IGNORECASE):
+            risks.append("SELECT *")
+        # 2. Missing Limit
+        if "LIMIT" not in sql.upper():
+            risks.append("NO LIMIT")
+        # 3. High Scan Volume (> 100 GB)
+        if bytes_billed > (100 * 1024**3):
+            risks.append("HEAVY SCAN")
+        return risks

queryguard_cli-0.1.0/queryguard/client.py ADDED Viewed

@@ -0,0 +1,121 @@
+from concurrent import futures
+import sys
+from typing import cast
+from google.auth import default
+from google.auth.credentials import Credentials
+from google.cloud.bigquery import Client as BigQueryClient
+def get_bq_client(project_id: str | None = None) -> BigQueryClient:
+    """
+    Authenticates using local Application Default Credentials (ADC).
+    Returns a configured BigQueryClient.
+    """
+    try:
+        credentials, default_project = default()
+        target_project = project_id or default_project
+        if not target_project:
+            print(
+                "Error: No Google Cloud project found. Please pass --project or set a default in gcloud.")
+            sys.exit(1)
+        return BigQueryClient(project=target_project, credentials=cast(Credentials, credentials))
+    except Exception as e:
+        print(f"Authentication Error: {e}")
+        print("Tip: Run 'gcloud auth application-default login' to authenticate.")
+        sys.exit(1)
+def discover_active_regions(client: BigQueryClient, project_id: str) -> list[str]:
+    """
+    Auto-detects active regions by listing datasets.
+    Accesses _properties directly as DatasetListItem does not expose .location.
+    """
+    print(f"   ... Auto-discovering active regions for {project_id} ...")
+    try:
+        datasets = list(client.list_datasets(project=project_id))
+    except Exception as e:
+        print(f"   Warning: Could not list datasets to discover regions ({e})")
+        return ["us", "eu"] # Fallback defaults
+    regions = set()
+    for dataset in datasets:
+        # Accessing the raw resource dict
+        props = dataset._properties
+        if "location" in props:
+            regions.add(props["location"].lower())
+    found = list(regions)
+    if not found:
+        print("   Warning: No datasets found. Defaulting to 'us'.")
+        return ["us"]
+    print(f"   Found active data in: {', '.join(found)}")
+    return found
+def _fetch_single_region(client: BigQueryClient, project_id: str, region: str, days: int, limit: int) -> list[dict]:
+    """Worker: Scans one specific region."""
+    table_id = f"`{project_id}`.`region-{region}`.INFORMATION_SCHEMA.JOBS"
+    query = f"""
+        SELECT
+        job_id,
+        user_email,
+        total_bytes_billed,
+        query,
+        creation_time,
+        total_slot_ms,
+        statement_type
+    FROM
+        {table_id}
+    WHERE
+        creation_time > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL {days} DAY)
+        AND job_type = 'QUERY'
+        AND statement_type != 'SCRIPT'
+        AND total_bytes_billed > 0
+    ORDER BY
+        total_bytes_billed DESC
+    LIMIT {limit}
+    """
+    try:
+        query_job = client.query(query)
+        return [dict(row) | {"region": region} for row in query_job.result()]
+    except Exception:
+        return []
+def fetch_recent_jobs(client: BigQueryClient, project_id: str, region: str, days: int, global_scan: bool, limit: int) -> list[dict]:
+    """
+    Fetches jobs from a single region OR auto-discovers all regions if global_scan is True.
+    """
+    if not global_scan:
+        return _fetch_single_region(client, project_id, region.lower(), days, limit)
+    regions_to_scan: list[str] = discover_active_regions(client, project_id)
+    all_jobs: list[dict] = []
+    print(f"   ... Scanning {len(regions_to_scan)} regions in parallel ...")
+    with futures.ThreadPoolExecutor(max_workers=10) as executor:
+        future_to_region = {
+            executor.submit(_fetch_single_region, client, project_id, r, days, limit): r
+            for r in regions_to_scan
+        }
+        for future in futures.as_completed(future_to_region):
+            data = future.result()
+            if data:
+                all_jobs.extend(data)
+    seen_jobs = set()
+    unique_jobs = []
+    for job in all_jobs:
+        if job['job_id'] not in seen_jobs:
+            unique_jobs.append(job)
+            seen_jobs.add(job['job_id'])
+    unique_jobs.sort(key=lambda x: x.get('total_bytes_billed', 0), reverse=True)
+    return unique_jobs[:limit]

queryguard_cli-0.1.0/queryguard/console_utils.py ADDED Viewed

@@ -0,0 +1,20 @@
+from rich.console import Console
+from rich.panel import Panel
+from rich.table import Table
+def print_audit_results(
+    console: Console,
+    table: Table,
+    total_spend: float,
+    displayed_count: int
+) -> None:
+    """Prints the audit results in a formatted table."""
+    console.print(table)
+    console.print(Panel(
+        f"[bold]Total Cost in View: ${total_spend:.2f}[/bold]\n"
+        f"Showing {displayed_count} queries.",
+        title="Audit Summary",
+        border_style="white",
+        expand=False
+    ))

queryguard_cli-0.1.0/queryguard/main.py ADDED Viewed

@@ -0,0 +1,127 @@
+from rich.console import Console
+from rich.panel import Panel
+from rich.progress import Progress, SpinnerColumn, TextColumn
+from rich.table import Table
+from typer import Context, Exit, Option, Typer
+from google.cloud.bigquery import Client as BigQueryClient
+from .analysis import ForensicAuditor
+from .client import fetch_recent_jobs, get_bq_client
+from .console_utils import print_audit_results
+app: Typer = Typer()
+console: Console = Console()
+@app.callback(invoke_without_command=True)
+def main_callback(ctx: Context):
+    """
+    QueryGuard CLI - Audit BigQuery spend and find cost anomalies.
+    """
+    if ctx.invoked_subcommand is None:
+        console.print(Panel(
+            "[bold]QueryGuard CLI[/bold]\n\n"
+            "The forensic auditor for your BigQuery bills.\n\n"
+            "Usage:\n"
+            "  [cyan]bqg scan[/cyan]    Start a forensic audit\n"
+            "  [cyan]bqg scan --help[/cyan]  Show all options",
+            title="Welcome",
+            border_style="blue"
+        ))
+        raise Exit()
+@app.command("scan")
+def scan(
+    project_id: str = Option(None, "--project", "-p", help="GCP Project ID. Defaults to local config."),
+    region: str = Option("us", "--region", "-r", help="BigQuery Region (e.g. us, eu, europe-west4)."),
+    days: int = Option(7, "--days", "-d", help="Lookback window in days."),
+    global_scan: bool = Option(False, "--global", "-g", help="Auto-discover and scan all active regions."),
+    limit: int = Option(10, "--limit", "-l", help="Number of queries to show."),
+    humans_only: bool = Option(False, "--humans-only", help="Filter out service accounts and bots."),
+):
+    """
+    Audit BigQuery spend. Use --humans-only to find manual errors.
+    """
+    client: BigQueryClient = get_bq_client(project_id)
+    target_project_id: str = client.project
+    region_display = "GLOBAL (Auto-Discovery)" if global_scan else region
+    console.print(f"[bold]QueryGuard Forensic Scan[/bold]")
+    console.print(f"Target: [cyan]{target_project_id}[/cyan] | Region: [cyan]{region_display}[/cyan] | Lookback: [cyan]{days} days[/cyan]\n")
+    with Progress(
+        SpinnerColumn(),
+        TextColumn("[progress.description]{task.description}"),
+        transient=True,
+    ) as progress:
+        progress.add_task(description="Scanning audit logs...", total=None)
+        rows = fetch_recent_jobs(client, target_project_id, region, days, global_scan, limit=limit * 5)
+        jobs = list(rows)
+    if not jobs:
+        console.print("[yellow]No billed queries found in the specified period.[/yellow]")
+        raise Exit()
+    # 3. Build Table
+    table: Table = Table(
+        title=f"Top Expensive Queries ({'Humans Only' if humans_only else 'All Users'})",
+        border_style="white",
+        box=None,
+        header_style="bold cyan"
+    )
+    table.add_column("User", style="white", no_wrap=True)
+    table.add_column("Region", justify="right")
+    table.add_column("Data", justify="right")
+    table.add_column("Cost", justify="right", style="green")
+    table.add_column("Flags", style="red")
+    table.add_column("Query Snippet", style="dim")
+    displayed_count: float = 0
+    total_spend: float = 0.0
+    job: dict
+    for job in jobs:
+        sql = job.get('query') or ""
+        user_email = job.get('user_email') or "unknown"
+        total_bytes_billed = job.get('total_bytes_billed') or 0
+        job_region: str = job.get('region') or "us"
+        if humans_only:
+            # Common patterns for bots/service accounts
+            if "gserviceaccount" in user_email or "monitoring" in user_email:
+                continue
+        # FIXME calculate more precise region if global scan
+        cost: float = ForensicAuditor.calculate_cost(total_bytes_billed, job_region)
+        total_spend += cost
+        risks: list[str] = ForensicAuditor.analyze_query(sql, total_bytes_billed)
+        gb_scanned: float = total_bytes_billed / (1024**3)
+        clean_query: str = " ".join(sql.replace("\n", " ").split())
+        query_snippet: str = clean_query[:60] + "..." if len(clean_query) > 60 else clean_query
+        table.add_row(
+            user_email.split('@')[0],
+            job_region,
+            f"{gb_scanned:.2f} GB",
+            f"${cost:.2f}",
+            ", ".join(risks) if risks else "[dim]OK[/dim]",
+            query_snippet
+        )
+        displayed_count += 1
+        if displayed_count >= limit:
+            break
+    print_audit_results(console, table, total_spend, displayed_count)
+if __name__ == "__main__":
+    app()