dbr-logs 0.1.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- dbr_logs-0.1.0/.claude-plugin/plugin.json +6 -0
- dbr_logs-0.1.0/.claude-plugin/skills/dbr-logs/SKILL.md +155 -0
- dbr_logs-0.1.0/.claude-plugin/skills/dbr-logs/references/log-structure.md +75 -0
- dbr_logs-0.1.0/.github/dependabot.yml +17 -0
- dbr_logs-0.1.0/.github/workflows/ci.yml +38 -0
- dbr_logs-0.1.0/.github/workflows/publish.yml +70 -0
- dbr_logs-0.1.0/.gitignore +11 -0
- dbr_logs-0.1.0/.python-version +1 -0
- dbr_logs-0.1.0/AGENTS.md +1 -0
- dbr_logs-0.1.0/CLAUDE.md +29 -0
- dbr_logs-0.1.0/LICENSE +21 -0
- dbr_logs-0.1.0/PKG-INFO +194 -0
- dbr_logs-0.1.0/README.md +174 -0
- dbr_logs-0.1.0/pyproject.toml +75 -0
- dbr_logs-0.1.0/spec/01-cli-tool.md +450 -0
- dbr_logs-0.1.0/spec/02-claude-skill.md +158 -0
- dbr_logs-0.1.0/spec/03-tui-mode.md +83 -0
- dbr_logs-0.1.0/src/dbr_logs/__init__.py +1 -0
- dbr_logs-0.1.0/src/dbr_logs/_version.py +34 -0
- dbr_logs-0.1.0/src/dbr_logs/cli.py +135 -0
- dbr_logs-0.1.0/src/dbr_logs/config.py +118 -0
- dbr_logs-0.1.0/src/dbr_logs/databricks_client.py +171 -0
- dbr_logs-0.1.0/src/dbr_logs/discovery.py +216 -0
- dbr_logs-0.1.0/src/dbr_logs/fetcher.py +75 -0
- dbr_logs-0.1.0/src/dbr_logs/filters.py +63 -0
- dbr_logs-0.1.0/src/dbr_logs/formatter.py +56 -0
- dbr_logs-0.1.0/src/dbr_logs/merger.py +40 -0
- dbr_logs-0.1.0/src/dbr_logs/models.py +55 -0
- dbr_logs-0.1.0/src/dbr_logs/noise.py +80 -0
- dbr_logs-0.1.0/src/dbr_logs/parser.py +71 -0
- dbr_logs-0.1.0/src/dbr_logs/resolver.py +67 -0
- dbr_logs-0.1.0/tests/__init__.py +0 -0
- dbr_logs-0.1.0/tests/conftest.py +8 -0
- dbr_logs-0.1.0/tests/test_cli.py +78 -0
- dbr_logs-0.1.0/tests/test_discovery.py +153 -0
- dbr_logs-0.1.0/tests/test_fetcher.py +106 -0
- dbr_logs-0.1.0/tests/test_filters.py +74 -0
- dbr_logs-0.1.0/tests/test_formatter.py +75 -0
- dbr_logs-0.1.0/tests/test_merger.py +61 -0
- dbr_logs-0.1.0/tests/test_noise.py +92 -0
- dbr_logs-0.1.0/tests/test_parser.py +66 -0
- dbr_logs-0.1.0/tests/test_resolver.py +108 -0
- dbr_logs-0.1.0/uv.lock +772 -0
|
@@ -0,0 +1,155 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: dbr-logs
|
|
3
|
+
description: Fetch, search, and analyze Databricks job logs. Use when user mentions "job logs", "databricks logs", "executor logs", "driver logs", "spark job failed", "check logs for", or asks to debug a Databricks job failure. Do NOT use for general Spark code questions or Databricks cluster configuration.
|
|
4
|
+
allowed-tools: Bash Read Grep
|
|
5
|
+
metadata:
|
|
6
|
+
author: dbr-logs contributors
|
|
7
|
+
version: 1.0.0
|
|
8
|
+
---
|
|
9
|
+
|
|
10
|
+
# dbr-logs: Fetch and Analyze Databricks Job Logs
|
|
11
|
+
|
|
12
|
+
You are a Databricks job log analyst. Follow these steps to fetch, analyze, and explain job logs.
|
|
13
|
+
|
|
14
|
+
## Step 0: Ensure CLI is available
|
|
15
|
+
|
|
16
|
+
Check if the `dbr-logs` CLI is accessible. Try each tier in order:
|
|
17
|
+
|
|
18
|
+
```bash
|
|
19
|
+
which dbr-logs
|
|
20
|
+
```
|
|
21
|
+
|
|
22
|
+
1. **Found** -> use `dbr-logs` directly
|
|
23
|
+
2. **Not found** -> check for `uvx`:
|
|
24
|
+
```bash
|
|
25
|
+
which uvx
|
|
26
|
+
```
|
|
27
|
+
- If `uvx` available -> use `uvx --from dbr-logs dbr-logs <args>` for all commands below
|
|
28
|
+
- If `uvx` not available -> ask the user:
|
|
29
|
+
> `dbr-logs` CLI not found. Install options:
|
|
30
|
+
> - `uv tool install dbr-logs`
|
|
31
|
+
> - `pip install dbr-logs`
|
|
32
|
+
>
|
|
33
|
+
> Want me to install it?
|
|
34
|
+
- If user declines -> fall back to raw `databricks fs ls` / `databricks fs cat` commands. Warn: "Using raw Databricks CLI (no log merging or filtering). Install dbr-logs for a better experience." Load `references/log-structure.md` for directory layout guidance.
|
|
35
|
+
|
|
36
|
+
For the rest of these instructions, `DBR_LOGS` refers to whichever invocation method was resolved above (`dbr-logs`, `uvx --from dbr-logs dbr-logs`, etc.).
|
|
37
|
+
|
|
38
|
+
## Step 1: Resolve the target job
|
|
39
|
+
|
|
40
|
+
- If the user provides a **job name** -> use it directly
|
|
41
|
+
- If the user provides a **Databricks URL** -> pass the full URL as the positional argument (the CLI parses job/run from it)
|
|
42
|
+
- If the user **describes a failure without naming a job** -> ask which job to investigate
|
|
43
|
+
- If the user specifies a **source** (e.g. "check executor logs", "look at the driver") -> use `--source` accordingly
|
|
44
|
+
- Default environment is `prod`. Only add `--env <env>` if the user specifies a different environment.
|
|
45
|
+
|
|
46
|
+
## Step 2: Fetch logs via CLI
|
|
47
|
+
|
|
48
|
+
Run `DBR_LOGS` with appropriate flags. **Always use `--format jsonl`** when you (Claude) are consuming the output — structured data is easier to analyze. Use `--format text` only when the user wants raw output displayed directly.
|
|
49
|
+
|
|
50
|
+
**Priority: match the user's intent.** If the user asks to search for a specific string or pattern, pipe the output to `grep` rather than adding `--level` filtering — the match may appear at any log level (INFO, DEBUG, etc.). Only default to `--level ERROR,WARN` when the user asks about failures/errors without specifying what to search for. Similarly, if the user specifies a source (e.g. "executor logs"), honor that with `--source` rather than fetching all sources.
|
|
51
|
+
|
|
52
|
+
**Always use `--focus`** unless the user explicitly asks for raw/unfiltered output. This suppresses Spark/JVM noise (thread dumps, shuffle lifecycle, task assignments) that buries application logs.
|
|
53
|
+
|
|
54
|
+
### Common patterns
|
|
55
|
+
|
|
56
|
+
```bash
|
|
57
|
+
# User asks about errors/failures (no specific search term)
|
|
58
|
+
DBR_LOGS <job-name> --level ERROR,WARN --focus --format jsonl
|
|
59
|
+
|
|
60
|
+
# Specific run
|
|
61
|
+
DBR_LOGS <job-name> --run-id <run-id> --level ERROR,WARN --focus --format jsonl
|
|
62
|
+
|
|
63
|
+
# User says "check executor logs" (honor the source, fetch all levels)
|
|
64
|
+
DBR_LOGS <job-name> --source executor --focus --format jsonl
|
|
65
|
+
|
|
66
|
+
# Executor errors specifically
|
|
67
|
+
DBR_LOGS <job-name> --source executor --level ERROR,WARN --focus --format jsonl
|
|
68
|
+
|
|
69
|
+
# Single executor deep dive
|
|
70
|
+
DBR_LOGS <job-name> --source executor:3 --focus --format jsonl
|
|
71
|
+
|
|
72
|
+
# User asks to search for a specific string (pipe to grep, no --level)
|
|
73
|
+
DBR_LOGS <job-name> --focus --format jsonl | grep "partition count"
|
|
74
|
+
|
|
75
|
+
# Search for a specific error pattern
|
|
76
|
+
DBR_LOGS <job-name> --focus --format jsonl | grep "OutOfMemoryError"
|
|
77
|
+
|
|
78
|
+
# Driver only
|
|
79
|
+
DBR_LOGS <job-name> --source driver --focus --format jsonl
|
|
80
|
+
|
|
81
|
+
# Include log4j or stacktrace files
|
|
82
|
+
DBR_LOGS <job-name> --include-log4j --include-stacktrace --focus --format jsonl
|
|
83
|
+
|
|
84
|
+
# Logs from the last hour
|
|
85
|
+
DBR_LOGS <job-name> --since 1h --focus --format jsonl
|
|
86
|
+
|
|
87
|
+
# Staging environment
|
|
88
|
+
DBR_LOGS <job-name> --env staging --focus --format jsonl
|
|
89
|
+
|
|
90
|
+
# Raw unfiltered output (no noise suppression)
|
|
91
|
+
DBR_LOGS <job-name> --format jsonl
|
|
92
|
+
```
|
|
93
|
+
|
|
94
|
+
### CLI reference
|
|
95
|
+
|
|
96
|
+
| Option | Short | Description |
|
|
97
|
+
|---|---|---|
|
|
98
|
+
| `<job>` | positional | Job name or Databricks workspace URL |
|
|
99
|
+
| `--run-id` | `-r` | Run ID. Omit for latest run. |
|
|
100
|
+
| `--env` | `-e` | `prod` (default), `staging`, `ondemand` |
|
|
101
|
+
| `--dbr-profile` | `-p` | Databricks CLI profile name |
|
|
102
|
+
| `--source` | `-s` | `driver`, `executor`, `executor:N`, `all` (default) |
|
|
103
|
+
| `--stream` | | `stderr`, `stdout`, `all` (default) |
|
|
104
|
+
| `--level` | `-l` | Exact match, comma-separated: `ERROR`, `WARN`, `INFO`, `DEBUG` |
|
|
105
|
+
| `--include-log4j` | | Include driver log4j files |
|
|
106
|
+
| `--include-stacktrace` | | Include driver stacktrace files |
|
|
107
|
+
| `--format` | `-f` | `text` or `jsonl` |
|
|
108
|
+
| `--tail` | `-n` | Show only last N lines |
|
|
109
|
+
| `--since` | | Logs since time (e.g. `1h`, `30m`, ISO datetime) |
|
|
110
|
+
| `--focus` | | Suppress Spark/JVM noise (thread dumps, shuffle, task lifecycle) |
|
|
111
|
+
|
|
112
|
+
## Step 3: Analyze the output
|
|
113
|
+
|
|
114
|
+
Parse the JSONL output and look for these root cause patterns:
|
|
115
|
+
|
|
116
|
+
| Pattern | Likely cause | Key fields to check |
|
|
117
|
+
|---|---|---|
|
|
118
|
+
| `OutOfMemoryError` / `java.lang.OutOfMemoryError` | Executor or driver memory too small | Which source (driver vs executor), heap vs off-heap |
|
|
119
|
+
| `Connection refused` / `ShuffleBlockFetcher` / `TransportChannelHandler` | Network or shuffle issues, node went unhealthy | Target IP, timeout duration, which executor |
|
|
120
|
+
| `RESOURCE_DOES_NOT_EXIST` | Missing table, view, or path | Resource name in error message |
|
|
121
|
+
| `AnalysisException` | SQL/schema issues (column not found, type mismatch) | SQL statement or column name |
|
|
122
|
+
| `HangingTaskDetector` | Data skew or stuck tasks | Task IDs, duration, which executor |
|
|
123
|
+
| `FileNotFoundException` / `FileAlreadyExistsException` | Concurrent writes or stale metadata | File path |
|
|
124
|
+
| `SparkException: Job aborted` | Upstream task failure cascade | Root cause in "caused by" chain |
|
|
125
|
+
| `Py4JJavaError` | Python-side error propagated to JVM | Python traceback in the message |
|
|
126
|
+
|
|
127
|
+
When analyzing:
|
|
128
|
+
1. **Group errors by source** (driver vs specific executors)
|
|
129
|
+
2. **Identify the root cause** — often the first error chronologically is the root cause; later errors are cascading failures
|
|
130
|
+
3. **Note the timeline** — when errors started, how long the job ran before failing
|
|
131
|
+
4. **Check for patterns across executors** — same error on all executors suggests a systemic issue; one executor suggests data skew or node problem
|
|
132
|
+
|
|
133
|
+
## Step 4: Present findings and suggest next steps
|
|
134
|
+
|
|
135
|
+
Structure your response as:
|
|
136
|
+
|
|
137
|
+
1. **Summary**: What happened, which run, when
|
|
138
|
+
2. **Errors found**: Grouped by source, with key log lines quoted
|
|
139
|
+
3. **Root cause assessment**: Your best determination of why the job failed
|
|
140
|
+
4. **Suggested actions** based on error type:
|
|
141
|
+
|
|
142
|
+
| Error type | Suggested actions |
|
|
143
|
+
|---|---|
|
|
144
|
+
| OOM | Increase executor/driver memory, check for data skew, reduce partition size |
|
|
145
|
+
| Shuffle/network | Enable shuffle retry settings, check cluster health, increase shuffle partitions |
|
|
146
|
+
| Missing resource | Verify table/path exists, check permissions, check if upstream job ran |
|
|
147
|
+
| Schema/SQL | Fix column references, check for schema evolution, verify data types |
|
|
148
|
+
| Hanging tasks | Increase shuffle partitions, check for data skew, salting join keys |
|
|
149
|
+
| Concurrent write | Check for overlapping job schedules, enable Delta conflict resolution |
|
|
150
|
+
|
|
151
|
+
If the error is unclear, suggest:
|
|
152
|
+
- Checking a specific executor's full logs (`--source executor:N`)
|
|
153
|
+
- Looking at driver log4j for more context (`--include-log4j`)
|
|
154
|
+
- Comparing with a previous successful run
|
|
155
|
+
- Widening the log level to include WARN or INFO
|
|
@@ -0,0 +1,75 @@
|
|
|
1
|
+
# Databricks Log Directory Structure
|
|
2
|
+
|
|
3
|
+
This reference is used when falling back to raw `databricks fs` commands (when `dbr-logs` CLI is not available).
|
|
4
|
+
|
|
5
|
+
## Directory layout
|
|
6
|
+
|
|
7
|
+
```
|
|
8
|
+
dbfs:/Volumes/catalog/schema/logs/{env}/{job_name}/{run_id}/
|
|
9
|
+
├── driver/
|
|
10
|
+
│ ├── stderr # active file (plain text)
|
|
11
|
+
│ ├── stderr--2026-03-11--18-00 # rotated (plain text, NOT gzipped)
|
|
12
|
+
│ ├── stderr--2026-03-11--19-00
|
|
13
|
+
│ ├── stdout # active file (plain text)
|
|
14
|
+
│ ├── stdout--2026-03-11--18-00 # rotated (plain text)
|
|
15
|
+
│ ├── log4j-active.log # active log4j (plain text)
|
|
16
|
+
│ ├── log4j-2026-03-11-17.log.gz # rotated log4j (gzipped)
|
|
17
|
+
│ ├── stacktrace.log # active stacktrace (plain text)
|
|
18
|
+
│ └── 2026-03-11-17.stacktrace.log.gz # rotated stacktrace (gzipped)
|
|
19
|
+
├── executor/
|
|
20
|
+
│ └── app-20260311170849-0000/
|
|
21
|
+
│ ├── 0/
|
|
22
|
+
│ │ ├── stderr # active file (plain text)
|
|
23
|
+
│ │ ├── stderr--2026-03-11--18.gz # rotated (gzipped)
|
|
24
|
+
│ │ ├── stdout # active file (plain text)
|
|
25
|
+
│ │ └── stdout--2026-03-11--18.gz # rotated (gzipped)
|
|
26
|
+
│ ├── 1/
|
|
27
|
+
│ ├── 2/
|
|
28
|
+
│ ...
|
|
29
|
+
│ └── N/
|
|
30
|
+
├── eventlog/ # Spark event log (not targeted)
|
|
31
|
+
└── init_scripts/ # optional
|
|
32
|
+
```
|
|
33
|
+
|
|
34
|
+
## Key differences between driver and executor logs
|
|
35
|
+
|
|
36
|
+
| Property | Driver | Executor |
|
|
37
|
+
|---|---|---|
|
|
38
|
+
| Rotated stderr/stdout | `stderr--YYYY-MM-DD--HH-MM` (plain text) | `stderr--YYYY-MM-DD--HH.gz` (gzipped) |
|
|
39
|
+
| Additional log files | `log4j-*.log.gz`, `stacktrace.log*` (may be absent) | None |
|
|
40
|
+
| Executor hierarchy | N/A | `app-{id}/{executor_num}/` |
|
|
41
|
+
| Presence | Always present | Only on multi-node jobs |
|
|
42
|
+
|
|
43
|
+
## Not all files are present in every run
|
|
44
|
+
|
|
45
|
+
Some driver directories have only `stderr`/`stdout` (no log4j, no stacktrace). Some jobs have no `executor/` directory at all. Always discover what exists rather than assuming a fixed structure.
|
|
46
|
+
|
|
47
|
+
## Environments
|
|
48
|
+
|
|
49
|
+
- `prod` — production jobs
|
|
50
|
+
- `staging` — staging jobs
|
|
51
|
+
- `ondemand` — ad-hoc runs
|
|
52
|
+
|
|
53
|
+
## Fallback workflow using raw Databricks CLI
|
|
54
|
+
|
|
55
|
+
When `dbr-logs` is not available, use these commands to manually navigate the log tree:
|
|
56
|
+
|
|
57
|
+
```bash
|
|
58
|
+
# 1. Find the latest run directory
|
|
59
|
+
databricks fs ls "dbfs:/Volumes/catalog/schema/logs/prod/<job-name>/" | tail -1
|
|
60
|
+
|
|
61
|
+
# 2. List available log sources
|
|
62
|
+
databricks fs ls "dbfs:/Volumes/catalog/schema/logs/prod/<job-name>/<run-id>/"
|
|
63
|
+
|
|
64
|
+
# 3. Read driver stderr (most useful starting point)
|
|
65
|
+
databricks fs cat "dbfs:/Volumes/catalog/schema/logs/prod/<job-name>/<run-id>/driver/stderr"
|
|
66
|
+
|
|
67
|
+
# 4. List executor directories (if they exist)
|
|
68
|
+
databricks fs ls "dbfs:/Volumes/catalog/schema/logs/prod/<job-name>/<run-id>/executor/"
|
|
69
|
+
# Then drill into: app-{id}/{executor_num}/stderr
|
|
70
|
+
|
|
71
|
+
# 5. For gzipped files, download and decompress locally
|
|
72
|
+
databricks fs cp "dbfs:/path/to/file.gz" /tmp/logfile.gz && gunzip -c /tmp/logfile.gz
|
|
73
|
+
```
|
|
74
|
+
|
|
75
|
+
Note: Without `dbr-logs`, you lose chronological merging across sources, level filtering, and regex grep. You must manually navigate each source and file.
|
|
@@ -0,0 +1,17 @@
|
|
|
1
|
+
version: 2
|
|
2
|
+
updates:
|
|
3
|
+
- package-ecosystem: "pip"
|
|
4
|
+
directory: "/"
|
|
5
|
+
schedule:
|
|
6
|
+
interval: "weekly"
|
|
7
|
+
groups:
|
|
8
|
+
dev-dependencies:
|
|
9
|
+
patterns:
|
|
10
|
+
- "pytest*"
|
|
11
|
+
- "ruff"
|
|
12
|
+
- "mypy"
|
|
13
|
+
|
|
14
|
+
- package-ecosystem: "github-actions"
|
|
15
|
+
directory: "/"
|
|
16
|
+
schedule:
|
|
17
|
+
interval: "weekly"
|
|
@@ -0,0 +1,38 @@
|
|
|
1
|
+
name: CI
|
|
2
|
+
|
|
3
|
+
on:
|
|
4
|
+
push:
|
|
5
|
+
branches: [main]
|
|
6
|
+
pull_request:
|
|
7
|
+
branches: [main]
|
|
8
|
+
|
|
9
|
+
jobs:
|
|
10
|
+
test:
|
|
11
|
+
runs-on: ubuntu-latest
|
|
12
|
+
strategy:
|
|
13
|
+
matrix:
|
|
14
|
+
python-version: ["3.11", "3.12", "3.13", "3.14"]
|
|
15
|
+
|
|
16
|
+
steps:
|
|
17
|
+
- uses: actions/checkout@v6
|
|
18
|
+
|
|
19
|
+
- name: Install uv
|
|
20
|
+
uses: astral-sh/setup-uv@v7
|
|
21
|
+
|
|
22
|
+
- name: Set up Python ${{ matrix.python-version }}
|
|
23
|
+
run: uv python install ${{ matrix.python-version }}
|
|
24
|
+
|
|
25
|
+
- name: Install dependencies
|
|
26
|
+
run: uv sync --dev
|
|
27
|
+
|
|
28
|
+
- name: Lint
|
|
29
|
+
run: uv run ruff check src/ tests/
|
|
30
|
+
|
|
31
|
+
- name: Format check
|
|
32
|
+
run: uv run ruff format --check src/ tests/
|
|
33
|
+
|
|
34
|
+
- name: Type check
|
|
35
|
+
run: uv run mypy src/
|
|
36
|
+
|
|
37
|
+
- name: Test
|
|
38
|
+
run: uv run pytest --cov=dbr_logs --cov-report=term-missing
|
|
@@ -0,0 +1,70 @@
|
|
|
1
|
+
name: Release & Publish
|
|
2
|
+
|
|
3
|
+
on:
|
|
4
|
+
push:
|
|
5
|
+
tags:
|
|
6
|
+
- "v*"
|
|
7
|
+
|
|
8
|
+
jobs:
|
|
9
|
+
build:
|
|
10
|
+
runs-on: ubuntu-latest
|
|
11
|
+
steps:
|
|
12
|
+
- uses: actions/checkout@v6
|
|
13
|
+
with:
|
|
14
|
+
fetch-depth: 0
|
|
15
|
+
|
|
16
|
+
- name: Install uv
|
|
17
|
+
uses: astral-sh/setup-uv@v7
|
|
18
|
+
|
|
19
|
+
- name: Set up Python
|
|
20
|
+
run: uv python install 3.12
|
|
21
|
+
|
|
22
|
+
- name: Build package
|
|
23
|
+
run: uv build
|
|
24
|
+
|
|
25
|
+
- name: Upload dist artifacts
|
|
26
|
+
uses: actions/upload-artifact@v4
|
|
27
|
+
with:
|
|
28
|
+
name: dist
|
|
29
|
+
path: dist/
|
|
30
|
+
|
|
31
|
+
github-release:
|
|
32
|
+
name: Create GitHub Release
|
|
33
|
+
needs: build
|
|
34
|
+
runs-on: ubuntu-latest
|
|
35
|
+
permissions:
|
|
36
|
+
contents: write
|
|
37
|
+
steps:
|
|
38
|
+
- uses: actions/checkout@v6
|
|
39
|
+
|
|
40
|
+
- name: Download dist artifacts
|
|
41
|
+
uses: actions/download-artifact@v4
|
|
42
|
+
with:
|
|
43
|
+
name: dist
|
|
44
|
+
path: dist/
|
|
45
|
+
|
|
46
|
+
- name: Create GitHub Release
|
|
47
|
+
uses: softprops/action-gh-release@v2
|
|
48
|
+
with:
|
|
49
|
+
generate_release_notes: true
|
|
50
|
+
files: dist/*
|
|
51
|
+
|
|
52
|
+
publish:
|
|
53
|
+
name: Publish to PyPI
|
|
54
|
+
if: github.repository == 'zencity/databricks-logs-reader'
|
|
55
|
+
needs: [build, github-release]
|
|
56
|
+
runs-on: ubuntu-latest
|
|
57
|
+
environment:
|
|
58
|
+
name: pypi
|
|
59
|
+
url: https://pypi.org/p/dbr-logs
|
|
60
|
+
permissions:
|
|
61
|
+
id-token: write
|
|
62
|
+
steps:
|
|
63
|
+
- name: Download dist artifacts
|
|
64
|
+
uses: actions/download-artifact@v4
|
|
65
|
+
with:
|
|
66
|
+
name: dist
|
|
67
|
+
path: dist/
|
|
68
|
+
|
|
69
|
+
- name: Publish to PyPI
|
|
70
|
+
uses: pypa/gh-action-pypi-publish@release/v1
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
3.11
|
dbr_logs-0.1.0/AGENTS.md
ADDED
|
@@ -0,0 +1 @@
|
|
|
1
|
+
CLAUDE.md
|
dbr_logs-0.1.0/CLAUDE.md
ADDED
|
@@ -0,0 +1,29 @@
|
|
|
1
|
+
# dbr-logs
|
|
2
|
+
|
|
3
|
+
CLI tool for fetching and displaying Databricks job logs from Unity Catalog Volumes.
|
|
4
|
+
|
|
5
|
+
## Commands
|
|
6
|
+
|
|
7
|
+
```bash
|
|
8
|
+
uv sync --dev # Install dependencies
|
|
9
|
+
uv run python -m pytest # Run tests (74 tests)
|
|
10
|
+
uv run ruff check src/ tests/ # Lint
|
|
11
|
+
uv run ruff format --check src/ tests/ # Format check (add --fix to auto-format)
|
|
12
|
+
uv run python -m mypy src/ # Type check
|
|
13
|
+
```
|
|
14
|
+
|
|
15
|
+
## Architecture
|
|
16
|
+
|
|
17
|
+
Pipeline: `cli.py` -> `resolver.py` -> `discovery.py` -> `fetcher.py` -> `parser.py` -> `merger.py` -> `noise.py` -> `filters.py` -> `formatter.py`
|
|
18
|
+
|
|
19
|
+
- `databricks_client.py` - SOLE file importing `databricks.sdk`. All other modules use the adapter.
|
|
20
|
+
- `models.py` - Data models with `StrEnum` types (`SourceType`, `Stream`) and `logging` level ints
|
|
21
|
+
- `config.py` - Profile/config management via `~/.config/dbr-logs/config.toml`
|
|
22
|
+
|
|
23
|
+
## Gotchas
|
|
24
|
+
|
|
25
|
+
- **src layout**: Package lives in `src/dbr_logs/` — IDEs may need `src/` marked as source root
|
|
26
|
+
- `databricks-sdk` has incomplete type stubs — `databricks_client.py` has `ignore_errors = true` in mypy config
|
|
27
|
+
- Driver rotated log files are plain text; executor rotated files are `.gz`
|
|
28
|
+
- Level values are `logging.ERROR`/`logging.WARNING` ints, not strings
|
|
29
|
+
- All CI must pass: ruff check + ruff format --check + mypy strict + pytest
|
dbr_logs-0.1.0/LICENSE
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2026 Zencity
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
dbr_logs-0.1.0/PKG-INFO
ADDED
|
@@ -0,0 +1,194 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: dbr-logs
|
|
3
|
+
Version: 0.1.0
|
|
4
|
+
Summary: Fetch and display Databricks job logs from Unity Catalog Volumes
|
|
5
|
+
Project-URL: Homepage, https://github.com/zencity/databricks-logs-reader
|
|
6
|
+
Project-URL: Repository, https://github.com/zencity/databricks-logs-reader
|
|
7
|
+
Project-URL: Issues, https://github.com/zencity/databricks-logs-reader/issues
|
|
8
|
+
Author-email: alonisser <alonisser@gmail.com>
|
|
9
|
+
License-Expression: MIT
|
|
10
|
+
License-File: LICENSE
|
|
11
|
+
Classifier: License :: OSI Approved :: MIT License
|
|
12
|
+
Classifier: Programming Language :: Python :: 3.11
|
|
13
|
+
Classifier: Programming Language :: Python :: 3.12
|
|
14
|
+
Classifier: Programming Language :: Python :: 3.13
|
|
15
|
+
Classifier: Programming Language :: Python :: 3.14
|
|
16
|
+
Requires-Python: >=3.11
|
|
17
|
+
Requires-Dist: click>=8.1
|
|
18
|
+
Requires-Dist: databricks-sdk>=0.20.0
|
|
19
|
+
Description-Content-Type: text/markdown
|
|
20
|
+
|
|
21
|
+
# dbr-logs
|
|
22
|
+
|
|
23
|
+
Fetch and display Databricks job logs from Unity Catalog Volumes.
|
|
24
|
+
|
|
25
|
+
Merges driver and executor logs chronologically with source labels, so you can pipe them to `grep`, `jq`, or feed them to an LLM.
|
|
26
|
+
|
|
27
|
+
## Why
|
|
28
|
+
|
|
29
|
+
Debugging a failed Databricks job means navigating a deeply nested, inconsistently structured log directory tree:
|
|
30
|
+
|
|
31
|
+
```
|
|
32
|
+
dbfs:/Volumes/catalog/schema/logs/prod/my-spark-job/0311-170011-t5450avl/
|
|
33
|
+
├── driver/
|
|
34
|
+
│ ├── stderr
|
|
35
|
+
│ ├── stderr--2026-03-11--18-00 # rotated, plain text
|
|
36
|
+
│ ├── stderr--2026-03-11--19-00 # rotated, plain text
|
|
37
|
+
│ ├── stdout
|
|
38
|
+
│ ├── log4j-active.log
|
|
39
|
+
│ └── log4j-2026-03-11-17.log.gz # rotated, gzipped
|
|
40
|
+
├── executor/
|
|
41
|
+
│ └── app-20260311170849-0000/ # opaque app ID
|
|
42
|
+
│ ├── 0/
|
|
43
|
+
│ │ ├── stderr
|
|
44
|
+
│ │ ├── stderr--2026-03-11--18.gz # rotated, gzipped
|
|
45
|
+
│ │ └── stdout
|
|
46
|
+
│ ├── 1/
|
|
47
|
+
│ ├── 2/
|
|
48
|
+
│ ...
|
|
49
|
+
│ └── 8/
|
|
50
|
+
└── eventlog/
|
|
51
|
+
```
|
|
52
|
+
|
|
53
|
+
The manual process to find what went wrong:
|
|
54
|
+
|
|
55
|
+
1. **Find the run** — run IDs are opaque strings like `0311-170011-t5450avl`, not human-readable
|
|
56
|
+
2. **Navigate the tree** — driver logs, executor logs, or both? Which of the 9 executors had the error?
|
|
57
|
+
3. **Handle mixed compression** — driver rotated files are plain text, executor rotated files are `.gz`. You need to `databricks fs cp` + `gunzip` to read them
|
|
58
|
+
4. **Concatenate rotated files** — a single stream is split across the active file and multiple rotated files that must be read in chronological order
|
|
59
|
+
5. **Repeat across executors** — for a job with 9 executors, that's potentially 9 x 4 files to check
|
|
60
|
+
6. **Cross-reference timestamps** — the root cause is often in one source, but the symptoms appear in another
|
|
61
|
+
|
|
62
|
+
For background on how Python logging works in Databricks and why it ends up in this structure, see [Everything You Wanted to Know About Python Logging in Databricks](https://medium.com/python-in-plain-english/everything-you-wanted-to-know-about-python-logging-in-databricks-0c64da6f56c9).
|
|
63
|
+
|
|
64
|
+
There are heavier alternatives — Databricks' own [Practitioner's Ultimate Guide to Scalable Logging](https://www.databricks.com/blog/practitioners-ultimate-guide-scalable-logging) describes a full logging pipeline, and you could also route Databricks logs to Datadog or similar observability platforms. But these solutions carry significant ongoing costs and infrastructure overhead for something most teams only need occasionally when debugging a failed job. `dbr-logs` is a zero-cost, zero-infrastructure alternative: install a CLI tool, run one command, get your answer.
|
|
65
|
+
|
|
66
|
+
`dbr-logs` replaces the manual process with a single command. It discovers the log structure, downloads and decompresses all files, merges everything chronologically with source labels, and lets you filter by level, source, or regex.
|
|
67
|
+
|
|
68
|
+
## Prerequisites
|
|
69
|
+
|
|
70
|
+
- **Python 3.11+** (tested on 3.11, 3.12, 3.13, 3.14)
|
|
71
|
+
- **Databricks CLI** configured with at least one profile in `~/.databrickscfg` ([setup guide](https://docs.databricks.com/dev-tools/cli/index.html))
|
|
72
|
+
- **Unity Catalog Volumes** log destination configured on your Databricks jobs (`cluster_log_conf` pointing to a Volumes path)
|
|
73
|
+
|
|
74
|
+
## Installation
|
|
75
|
+
|
|
76
|
+
```bash
|
|
77
|
+
# Install as a CLI tool with uv (recommended)
|
|
78
|
+
uv tool install dbr-logs
|
|
79
|
+
|
|
80
|
+
# Or with pipx (isolated environment)
|
|
81
|
+
pipx install dbr-logs
|
|
82
|
+
|
|
83
|
+
# Or with pip (use --user to install globally without affecting your venv)
|
|
84
|
+
pip install --user dbr-logs
|
|
85
|
+
|
|
86
|
+
# Or run directly without installing
|
|
87
|
+
uvx dbr-logs <job-name>
|
|
88
|
+
```
|
|
89
|
+
|
|
90
|
+
## Usage
|
|
91
|
+
|
|
92
|
+
```bash
|
|
93
|
+
# Fetch logs for the latest run of a job
|
|
94
|
+
dbr-logs my-job-name
|
|
95
|
+
|
|
96
|
+
# Fetch logs from a specific run
|
|
97
|
+
dbr-logs my-job-name --run-id 12345
|
|
98
|
+
|
|
99
|
+
# Use a Databricks workspace URL
|
|
100
|
+
dbr-logs "https://dbc-xxx.cloud.databricks.com/jobs/12345/runs/67890?o=123"
|
|
101
|
+
|
|
102
|
+
# Show only errors
|
|
103
|
+
dbr-logs my-job-name --level ERROR
|
|
104
|
+
|
|
105
|
+
# Focus on application logs (suppress Spark/JVM noise)
|
|
106
|
+
dbr-logs my-job-name --focus
|
|
107
|
+
|
|
108
|
+
# Show only executor logs
|
|
109
|
+
dbr-logs my-job-name --source executor
|
|
110
|
+
|
|
111
|
+
# Show last 50 lines from a specific executor
|
|
112
|
+
dbr-logs my-job-name --source executor:3 --tail 50
|
|
113
|
+
|
|
114
|
+
# JSONL output for piping to jq
|
|
115
|
+
dbr-logs my-job-name --format jsonl | jq '.level'
|
|
116
|
+
|
|
117
|
+
# Logs since last hour
|
|
118
|
+
dbr-logs my-job-name --since 1h
|
|
119
|
+
```
|
|
120
|
+
|
|
121
|
+
## Claude Code Integration
|
|
122
|
+
|
|
123
|
+
`dbr-logs` includes a Claude Code plugin so you can fetch and analyze logs directly from a Claude conversation.
|
|
124
|
+
|
|
125
|
+
### Install the plugin
|
|
126
|
+
|
|
127
|
+
```bash
|
|
128
|
+
# Option 1: Claude Code Plugin
|
|
129
|
+
/plugin add zencity/databricks-logs-reader
|
|
130
|
+
|
|
131
|
+
# Option 2: skills.sh (works with Claude Code, Cursor, Windsurf, etc.)
|
|
132
|
+
npx skills add zencity/databricks-logs-reader
|
|
133
|
+
```
|
|
134
|
+
|
|
135
|
+
The CLI tool still needs to be installed separately (`pip install dbr-logs` or `uv tool install dbr-logs`), but the skill can also use `uvx` as a zero-install fallback.
|
|
136
|
+
|
|
137
|
+
### Example interaction
|
|
138
|
+
|
|
139
|
+
```
|
|
140
|
+
You: check the logs for my-spark-job
|
|
141
|
+
|
|
142
|
+
Claude:
|
|
143
|
+
Runs: dbr-logs my-spark-job --level ERROR,WARN --focus --format jsonl
|
|
144
|
+
Analyzes output, then responds:
|
|
145
|
+
|
|
146
|
+
"The latest run (0311-170011-t5450avl) has 3 errors:
|
|
147
|
+
|
|
148
|
+
- driver/stderr: TransportChannelHandler connection timeout to 10.177.10.232 after 600s
|
|
149
|
+
- executor/0-7 stderr: HangingTaskDetector reports tasks 140953, 140957 stuck for 10+ min
|
|
150
|
+
|
|
151
|
+
Root cause: Shuffle read stall — executors are hanging on remote block fetches.
|
|
152
|
+
|
|
153
|
+
Suggested actions:
|
|
154
|
+
- Check executor at 10.177.10.232 for resource pressure
|
|
155
|
+
- Enable spark.shuffle.io.retryWait / spark.shuffle.io.maxRetries
|
|
156
|
+
- Review shuffle partitions to reduce per-task data volume"
|
|
157
|
+
```
|
|
158
|
+
|
|
159
|
+
## Options
|
|
160
|
+
|
|
161
|
+
| Option | Short | Description |
|
|
162
|
+
|---|---|---|
|
|
163
|
+
| `--run-id` | `-r` | Databricks run ID (numeric). Defaults to latest run. |
|
|
164
|
+
| `--env` | `-e` | Environment: prod, staging, ondemand. Default: prod. |
|
|
165
|
+
| `--dbr-profile` | `-p` | Databricks CLI profile name. |
|
|
166
|
+
| `--source` | `-s` | `driver`, `executor`, `executor:N`, or `all` (default). |
|
|
167
|
+
| `--stream` | | `stderr`, `stdout`, or `all` (default). |
|
|
168
|
+
| `--level` | `-l` | Exact match, comma-separated: ERROR, WARN, INFO, DEBUG. |
|
|
169
|
+
| `--include-log4j` | | Include driver log4j files. |
|
|
170
|
+
| `--include-stacktrace` | | Include driver stacktrace files. |
|
|
171
|
+
| `--format` | `-f` | `text` (default) or `jsonl`. |
|
|
172
|
+
| `--tail` | `-n` | Show only last N lines. |
|
|
173
|
+
| `--since` | | Show logs since time (e.g. `1h`, `30m`, ISO datetime). |
|
|
174
|
+
| `--focus` | | Suppress Spark/JVM noise (thread dumps, shuffle, task lifecycle). |
|
|
175
|
+
|
|
176
|
+
## Configuration
|
|
177
|
+
|
|
178
|
+
On first run with multiple Databricks profiles, you'll be prompted to select a default. Config is saved to `~/.config/dbr-logs/config.toml`.
|
|
179
|
+
|
|
180
|
+
## Releasing
|
|
181
|
+
|
|
182
|
+
Version is derived from git tags via [hatch-vcs](https://github.com/ofek/hatch-vcs) — no version string to maintain in source code.
|
|
183
|
+
|
|
184
|
+
```bash
|
|
185
|
+
git tag v0.1.0
|
|
186
|
+
git push origin v0.1.0
|
|
187
|
+
```
|
|
188
|
+
|
|
189
|
+
This triggers the CI pipeline which: builds the package -> creates a GitHub Release with auto-generated notes -> publishes to PyPI.
|
|
190
|
+
|
|
191
|
+
## Limitations
|
|
192
|
+
|
|
193
|
+
- Only Unity Catalog Volumes log destinations are supported. S3 destinations are not yet supported.
|
|
194
|
+
- Jobs must have `cluster_log_conf` configured.
|