gdalgviz 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
gdalgviz-0.1.0/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 Seth G
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -0,0 +1,88 @@
1
+ Metadata-Version: 2.4
2
+ Name: gdalgviz
3
+ Version: 0.1.0
4
+ Summary: CLI tool for visualizing GDALG workflows
5
+ Author: Seth Girvin
6
+ License: MIT
7
+ Requires-Python: >=3.9
8
+ Description-Content-Type: text/markdown
9
+ License-File: LICENSE
10
+ Requires-Dist: graphviz>=0.20
11
+ Provides-Extra: dev
12
+ Requires-Dist: pytest; extra == "dev"
13
+ Requires-Dist: black; extra == "dev"
14
+ Requires-Dist: mypy; extra == "dev"
15
+ Requires-Dist: ruff; extra == "dev"
16
+ Dynamic: license-file
17
+
18
+ # gdalgviz
19
+
20
+ A Python library to visualise [GDAL pipelines](https://gdal.org/en/latest/programs/gdal_pipeline.html).
21
+
22
+ ## Installation
23
+
24
+ Requires [graphviz](https://graphviz.org/download/) to be installed on the system, and
25
+ has a dependency on the [graphviz](https://pypi.org/project/graphviz/) Python package.
26
+
27
+ On Windows:
28
+
29
+ ```powershell
30
+ $GVIZ_PATH = "C:\Program Files\Graphviz\bin"
31
+ $env:PATH = "$GVIZ_PATH;$env:PATH"
32
+ pip install gdalgviz
33
+ gdalgviz --version
34
+ ```
35
+
36
+ ## Usage
37
+
38
+ Passing a pipeline as a JSON file ([tee.json](./examples/tee.json)):
39
+
40
+ ```bash
41
+ gdalgviz ./examples/tee.json ./examples/tee.svg
42
+ ```
43
+
44
+ ![Workflow Diagram](./examples/tee.svg)
45
+
46
+ Passing a pipeline as a string:
47
+
48
+ ```bash
49
+ gdalgviz --pipeline "gdal vector pipeline ! read in.tif ! reproject --dst-crs=EPSG:32632 ! select --fields fid,geom" ./examples/pipeline.svg
50
+ ```
51
+
52
+ ![Workflow Diagram](./examples/pipeline.svg)
53
+
54
+ - Handles both JSON and text input. See [JSON Schema](./examples/gdalg.schema.json)
55
+ - Supports [nested pipelines](https://gdal.org/en/latest/programs/gdal_pipeline.html#nested-pipeline). These
56
+ allow sub-pipelines to be run in parallel and merged later.
57
+ - Supports [tee](https://gdal.org/en/latest/programs/gdal_pipeline.html#output-nested-pipeline) -
58
+ the operation is named "tee" because it splits the stream, like the letter "T": one input, multiple outputs,
59
+ and allows saving of intermediate results
60
+
61
+ This library does not execute the GDAL pipeline, it only visualizes it. The actual execution of the pipeline is done by GDAL itself.
62
+
63
+ ```python
64
+ from osgeo import gdal
65
+
66
+ gdal.UseExceptions()
67
+ with gdal.alg.pipeline(pipeline="read byte.tif ! reproject --dst-crs EPSG:4326 --resampling cubic") as alg:
68
+ ds = alg.Output()
69
+ ```
70
+
71
+ ## Development
72
+
73
+ ```powershell
74
+ pip install -e .[dev]
75
+ black .
76
+ ruff check . --fix
77
+ # mypy .
78
+ pytest tests
79
+ gdalgviz ./examples/tee.json ./examples/tee.svg
80
+ gdalgviz --pipeline "gdal vector pipeline ! read in.tif ! reproject --dst-crs=EPSG:32632 ! select --fields fid,geom" ./examples/pipeline.svg
81
+
82
+ ```
83
+
84
+ ## RoadMap
85
+
86
+ - Add JSON schema validation
87
+ - Add colour coding of the graph depending on if the command is raster, vector etc.
88
+ - Add types to the codebase
@@ -0,0 +1,71 @@
1
+ # gdalgviz
2
+
3
+ A Python library to visualise [GDAL pipelines](https://gdal.org/en/latest/programs/gdal_pipeline.html).
4
+
5
+ ## Installation
6
+
7
+ Requires [graphviz](https://graphviz.org/download/) to be installed on the system, and
8
+ has a dependency on the [graphviz](https://pypi.org/project/graphviz/) Python package.
9
+
10
+ On Windows:
11
+
12
+ ```powershell
13
+ $GVIZ_PATH = "C:\Program Files\Graphviz\bin"
14
+ $env:PATH = "$GVIZ_PATH;$env:PATH"
15
+ pip install gdalgviz
16
+ gdalgviz --version
17
+ ```
18
+
19
+ ## Usage
20
+
21
+ Passing a pipeline as a JSON file ([tee.json](./examples/tee.json)):
22
+
23
+ ```bash
24
+ gdalgviz ./examples/tee.json ./examples/tee.svg
25
+ ```
26
+
27
+ ![Workflow Diagram](./examples/tee.svg)
28
+
29
+ Passing a pipeline as a string:
30
+
31
+ ```bash
32
+ gdalgviz --pipeline "gdal vector pipeline ! read in.tif ! reproject --dst-crs=EPSG:32632 ! select --fields fid,geom" ./examples/pipeline.svg
33
+ ```
34
+
35
+ ![Workflow Diagram](./examples/pipeline.svg)
36
+
37
+ - Handles both JSON and text input. See [JSON Schema](./examples/gdalg.schema.json)
38
+ - Supports [nested pipelines](https://gdal.org/en/latest/programs/gdal_pipeline.html#nested-pipeline). These
39
+ allow sub-pipelines to be run in parallel and merged later.
40
+ - Supports [tee](https://gdal.org/en/latest/programs/gdal_pipeline.html#output-nested-pipeline) -
41
+ the operation is named "tee" because it splits the stream, like the letter "T": one input, multiple outputs,
42
+ and allows saving of intermediate results
43
+
44
+ This library does not execute the GDAL pipeline, it only visualizes it. The actual execution of the pipeline is done by GDAL itself.
45
+
46
+ ```python
47
+ from osgeo import gdal
48
+
49
+ gdal.UseExceptions()
50
+ with gdal.alg.pipeline(pipeline="read byte.tif ! reproject --dst-crs EPSG:4326 --resampling cubic") as alg:
51
+ ds = alg.Output()
52
+ ```
53
+
54
+ ## Development
55
+
56
+ ```powershell
57
+ pip install -e .[dev]
58
+ black .
59
+ ruff check . --fix
60
+ # mypy .
61
+ pytest tests
62
+ gdalgviz ./examples/tee.json ./examples/tee.svg
63
+ gdalgviz --pipeline "gdal vector pipeline ! read in.tif ! reproject --dst-crs=EPSG:32632 ! select --fields fid,geom" ./examples/pipeline.svg
64
+
65
+ ```
66
+
67
+ ## RoadMap
68
+
69
+ - Add JSON schema validation
70
+ - Add colour coding of the graph depending on if the command is raster, vector etc.
71
+ - Add types to the codebase
@@ -0,0 +1,7 @@
1
+ __version__ = "0.1.0"
2
+
3
+ from .gdalgviz import generate_diagram
4
+
5
+ __all__ = [
6
+ "generate_diagram",
7
+ ]
@@ -0,0 +1,86 @@
1
+ import argparse
2
+ import sys
3
+ import json
4
+ from pathlib import Path
5
+ from typing import Optional
6
+
7
+ from gdalgviz import __version__
8
+ from gdalgviz.gdalgviz import generate_diagram
9
+
10
+
11
+ def parse_file(fn: str) -> str:
12
+ """
13
+ Open a file and return its pipeline command.
14
+ If the file is JSON (.json or .JSON), then the JSON is parsed data['command_line'] is returned.
15
+ Otherwise, the raw text content is returned.
16
+ """
17
+ file_path = Path(fn)
18
+
19
+ if file_path.suffix.lower() == ".json":
20
+ with file_path.open("r", encoding="utf-8") as f:
21
+ data = json.load(f)
22
+ return data.get("command_line")
23
+ else:
24
+ with file_path.open("r", encoding="utf-8") as f:
25
+ return f.read()
26
+
27
+
28
+ def main(argv: Optional[list[str]] = None) -> int:
29
+ """
30
+ CLI entry point for gdalgviz.
31
+ Returns an exit code: 0 = success, non-zero = error.
32
+ """
33
+ parser = argparse.ArgumentParser(
34
+ prog="gdalgviz",
35
+ description="Visualize GDAL datasets from the command line",
36
+ )
37
+
38
+ parser.add_argument(
39
+ "input_path",
40
+ nargs="?",
41
+ help="Path to a GDALG pipeline in JSON or text format",
42
+ )
43
+
44
+ parser.add_argument(
45
+ "output_path",
46
+ help="Path to save the generated diagram (e.g., output.svg)",
47
+ )
48
+
49
+ parser.add_argument(
50
+ "--pipeline",
51
+ help="Provide a raw GDALG pipeline string instead of a file",
52
+ )
53
+
54
+ parser.add_argument(
55
+ "--version",
56
+ action="version",
57
+ version=f"gdalgviz {__version__}",
58
+ )
59
+
60
+ args = parser.parse_args(argv)
61
+
62
+ # validate that input_path exists if not using --pipeline
63
+ if not args.pipeline and not Path(args.input_path).exists():
64
+ print(f"Error: File '{args.input_path}' does not exist.", file=sys.stderr)
65
+ return 1
66
+
67
+ # get the pipeline text
68
+ if args.pipeline:
69
+ pipeline = args.pipeline
70
+ elif args.input_path:
71
+ input_fn = args.input_path
72
+ pipeline = parse_file(input_fn)
73
+ else:
74
+ parser.print_help()
75
+ return 1
76
+
77
+ exit_code = generate_diagram(
78
+ pipeline=pipeline,
79
+ output_fn=args.output_path,
80
+ )
81
+
82
+ return exit_code
83
+
84
+
85
+ if __name__ == "__main__":
86
+ sys.exit(main())
@@ -0,0 +1,129 @@
1
+ # Raster commands
2
+ RASTER_COMMANDS = {
3
+ "raster": "Entry point for raster commands",
4
+ "info": "Get information on a raster dataset",
5
+ "as-features": "Create features representing raster pixels",
6
+ "aspect": "Generate an aspect map",
7
+ "blend": "Blend/compose two raster datasets",
8
+ "calc": "Perform raster algebra",
9
+ "clean-collar": "Clean the collar of a raster dataset, removing noise",
10
+ "clip": "Clip a raster dataset",
11
+ "color-map": "Use a grayscale raster to replace the intensity of a RGB/RGBA dataset",
12
+ "compare": "Compare two raster datasets",
13
+ "convert": "Convert a raster dataset",
14
+ "contour": "Builds vector contour lines from a raster elevation model",
15
+ "create": "Create a new raster dataset",
16
+ "edit": "Edit in place a raster dataset",
17
+ "footprint": "Compute the footprint of a raster dataset",
18
+ "fill-nodata": "Fill raster regions by interpolation from edges",
19
+ "hillshade": "Generate a shaded relief map",
20
+ "index": "Create a vector index of raster datasets",
21
+ "materialize": "Materialize a piped dataset on disk to increase efficiency",
22
+ "mosaic": "Build a mosaic, either virtual (VRT) or materialized",
23
+ "neighbors": "Compute the value of each pixel from its neighbors (focal statistics)",
24
+ "nodata-to-alpha": "Replace nodata value(s) with an alpha band",
25
+ "overview": "Manage overviews of a raster dataset",
26
+ "overview add": "Add overviews to a raster dataset",
27
+ "overview delete": "Remove overviews of a raster dataset",
28
+ "overview refresh": "Refresh overviews",
29
+ "pansharpen": "Perform a pansharpen operation",
30
+ "polygonize": "Create a polygon feature dataset from a raster band",
31
+ "pixel-info": "Return information on a pixel of a raster dataset",
32
+ "rgb-to-palette": "Convert a RGB image into a pseudo-color / paletted image",
33
+ "reclassify": "Reclassify a raster dataset",
34
+ "reproject": "Reproject a raster dataset",
35
+ "resize": "Resize a raster dataset without changing the georeferenced extents",
36
+ "roughness": "Generate a roughness map",
37
+ "scale": "Scale the values of the bands of a raster dataset",
38
+ "select": "Select a subset of bands from a raster dataset",
39
+ "set-type": "Modify the data type of bands of a raster dataset",
40
+ "sieve": "Remove small raster polygons",
41
+ "slope": "Generate a slope map",
42
+ "stack": "Combine input bands into a multi-band output, either virtual (VRT) or materialized",
43
+ "tile": "Generate tiles in separate files from a raster dataset",
44
+ "tpi": "Generate a Topographic Position Index (TPI) map",
45
+ "tri": "Generate a Terrain Ruggedness Index (TRI) map",
46
+ "unscale": "Convert scaled values of a raster dataset into unscaled values",
47
+ "update": "Update the destination raster with the content of the input one",
48
+ "viewshed": "Compute the viewshed of a raster dataset",
49
+ "zonal-stats": "Compute raster zonal statistics",
50
+ }
51
+
52
+ # Vector commands
53
+ VECTOR_COMMANDS = {
54
+ "vector": "Entry point for vector commands",
55
+ "buffer": "Compute a buffer around geometries of a vector dataset",
56
+ "check-coverage": "Check a polygon coverage for validity",
57
+ "check-geometry": "Check a dataset for invalid or non-simple geometries",
58
+ "clean-coverage": "Remove gaps and overlaps in a polygon dataset",
59
+ "clip": "Clip a vector dataset",
60
+ "concat": "Concatenate vector datasets",
61
+ "convert": "Convert a vector dataset",
62
+ "edit": "Edit metadata of a vector dataset",
63
+ "explode-collections": "Explode geometries of type collection of a vector dataset",
64
+ "filter": "Filter a vector dataset",
65
+ "grid": "Create a regular grid from scattered points",
66
+ "info": "Get information on a vector dataset",
67
+ "index": "Create a vector index of vector datasets",
68
+ "layer-algebra": "Perform algebraic operation between 2 layers",
69
+ "make-point": "Create point geometries from coordinate fields",
70
+ "make-valid": "Fix validity of geometries of a vector dataset",
71
+ "materialize": "Materialize a piped dataset on disk to increase efficiency",
72
+ "partition": "Partition a vector dataset into multiple files",
73
+ "rasterize": "Burn vector geometries into a raster",
74
+ "reproject": "Reproject a vector dataset",
75
+ "segmentize": "Segmentize geometries of a vector dataset",
76
+ "select": "Select a subset of fields from a vector dataset",
77
+ "set-field-type": "Modify the type of a field of a vector dataset",
78
+ "set-geom-type": "Modify the geometry type of a vector dataset",
79
+ "simplify": "Simplify geometries of a vector dataset",
80
+ "simplify-coverage": "Simplify shared boundaries of a polygonal vector dataset",
81
+ "sort": "Spatially sort a vector dataset",
82
+ "sql": "Apply SQL statement(s) to a dataset",
83
+ "swap-xy": "Swap X and Y coordinates of geometries of a vector dataset",
84
+ "update": "Update an existing vector dataset with an input vector dataset",
85
+ }
86
+
87
+ MDIM_COMMANDS = {
88
+ "mdim": "Entry point for multidimensional commands",
89
+ "mdim info": "Get information on a multidimensional dataset",
90
+ "mdim convert": "Convert a multidimensional dataset",
91
+ "mdim mosaic": "Build a mosaic, either virtual (VRT) or materialized, from multidimensional datasets",
92
+ }
93
+
94
+
95
+ DATASET_COMMANDS = {
96
+ "dataset": "Entry point for dataset management commands",
97
+ "dataset identify": "Identify driver opening dataset(s)",
98
+ "dataset check": "Check whether there are errors when reading the content of a dataset",
99
+ "dataset copy": "Copy files of a dataset",
100
+ "dataset rename": "Rename files of a dataset",
101
+ "dataset delete": "Delete dataset(s)",
102
+ }
103
+
104
+
105
+ VSI_COMMANDS = {
106
+ "vsi": "Entry point for GDAL Virtual System Interface (VSI) commands",
107
+ "vsi copy": "Copy files located on GDAL Virtual System Interface (VSI)",
108
+ "vsi delete": "Delete files located on GDAL Virtual System Interface (VSI)",
109
+ "vsi list": "List files of one of the GDAL Virtual System Interface (VSI)",
110
+ "vsi move": "Move/rename a file/directory located on GDAL Virtual System Interface (VSI)",
111
+ "vsi sync": "Synchronize source and target file/directory located on GDAL Virtual System Interface (VSI)",
112
+ "vsi sozip": "SOZIP (Seek-Optimized ZIP) related commands",
113
+ }
114
+
115
+ DRIVER_COMMANDS = {
116
+ "driver gpkg repack": "Repack/vacuum in-place a GeoPackage dataset",
117
+ "driver gti create": "Create an index of raster datasets compatible with the GDAL Tile Index (GTI) driver",
118
+ "driver openfilegdb repack": "Repack in-place a FileGeodatabase dataset",
119
+ "driver parquet create-metadata-file": "Create the _metadata file for a partitioned Parquet dataset",
120
+ "driver pdf list-layer": "Return the list of layers of a PDF file",
121
+ }
122
+
123
+
124
+ COMMANDS = {}
125
+ COMMANDS.update(RASTER_COMMANDS)
126
+ COMMANDS.update(MDIM_COMMANDS)
127
+ COMMANDS.update(DATASET_COMMANDS)
128
+ COMMANDS.update(VSI_COMMANDS)
129
+ COMMANDS.update(DRIVER_COMMANDS)
@@ -0,0 +1,326 @@
1
+ import shlex
2
+ from pathlib import Path
3
+ from typing import List, Dict, Optional
4
+ from graphviz import Digraph
5
+ from .commands import RASTER_COMMANDS
6
+
7
+ # supported by Graphviz
8
+ VALID_FORMATS = ["svg", "png", "pdf", "jpg"]
9
+ # URL to GDAL command documentation
10
+ GDAL_DOCS_URL_TEMPLATE = (
11
+ "https://gdal.org/en/latest/programs/gdal_{cmd_type}_{command}.html"
12
+ )
13
+ # general commands that don't have dedicated docs pages
14
+ GDAL_OPERATORS = ("read", "write", "tee")
15
+
16
+
17
+ def get_output_format(filename: str, valid_formats: list[str]) -> str:
18
+ """
19
+ Infer output format from filename extension and validate it.
20
+ Raises ValueError if the extension is invalid.
21
+ """
22
+ ext = Path(filename).suffix.lower().lstrip(".") # e.g., ".svg" -> "svg"
23
+
24
+ if ext not in valid_formats:
25
+ raise ValueError(
26
+ f"Invalid output format '{ext}'. Must be one of {valid_formats}"
27
+ )
28
+
29
+ return ext
30
+
31
+
32
+ def get_command_type(cmd: str):
33
+ """
34
+ Get the command type (raster, vector, etc.)
35
+ Take the first match if different types use the same name
36
+ e.g. "read" exists in both raster and vector pipelines
37
+
38
+ TODO add other types
39
+ """
40
+
41
+ if cmd in RASTER_COMMANDS:
42
+ return "raster"
43
+ else:
44
+ return "vector"
45
+
46
+
47
+ def add_step_node(
48
+ g: Digraph,
49
+ step_dict: Dict[str, any],
50
+ parent_ids: List[Optional[str]],
51
+ node_counter: List[int],
52
+ pipeline_type: Optional[str] = None,
53
+ ) -> List[str]:
54
+ step_str = step_dict["step"]
55
+ cmd, args = parse_step(step_str)
56
+ label = step_label_html(cmd, args)
57
+
58
+ node_id = str(node_counter[0])
59
+ node_counter[0] += 1
60
+
61
+ if pipeline_type:
62
+ cmd_type = pipeline_type
63
+ else:
64
+ cmd_type = get_command_type(cmd)
65
+
66
+ # create the node
67
+ if cmd_type and cmd.lower() not in GDAL_OPERATORS:
68
+ cmd_cleaned = cmd.replace("-", "_")
69
+ url = GDAL_DOCS_URL_TEMPLATE.format(cmd_type=cmd_type, command=cmd_cleaned)
70
+ g.node(node_id, label=label, URL=url, tooltip=url)
71
+ else:
72
+ g.node(node_id, label=label)
73
+
74
+ # connect this node to all parents
75
+ for pid in parent_ids:
76
+ if pid is not None:
77
+ g.edge(pid, node_id)
78
+
79
+ nested_steps = step_dict.get("nested", [])
80
+
81
+ if not nested_steps:
82
+ return [node_id]
83
+
84
+ # tee splits into two paths
85
+ if cmd == "tee":
86
+ # tee: nested steps are dead-end outputs
87
+ for nested in nested_steps:
88
+ add_step_node(
89
+ g,
90
+ nested,
91
+ parent_ids=[node_id],
92
+ node_counter=node_counter,
93
+ pipeline_type=pipeline_type,
94
+ )
95
+ # tee itself continues to next step
96
+ return [node_id]
97
+
98
+ # normal nested pipeline: one independent sub-pipeline
99
+ # start with no inflows
100
+ nested_parent_ids = []
101
+
102
+ for nested in nested_steps:
103
+ nested_parent_ids = add_step_node(
104
+ g,
105
+ nested,
106
+ parent_ids=nested_parent_ids, # chain sequentially
107
+ node_counter=node_counter,
108
+ pipeline_type=pipeline_type,
109
+ )
110
+
111
+ # Final nested step(s) feed into this parent node
112
+ for nid in nested_parent_ids:
113
+ g.edge(nid, node_id)
114
+
115
+ return [node_id]
116
+
117
+
118
+ def parse_step_recursive(step: str):
119
+ if "[" in step and "]" in step:
120
+ # extract the inner block
121
+ before, inner = step.split("[", 1)
122
+ inner, after = inner.rsplit("]", 1)
123
+ # recurse
124
+ nested_steps = split_pipeline(inner)
125
+ return {
126
+ "step": before.strip(),
127
+ "nested": [parse_step_recursive(s) for s in nested_steps],
128
+ }
129
+ else:
130
+ return {"step": step.strip()}
131
+
132
+
133
+ def parse_step(step: str):
134
+ """
135
+ Split a step into command and grouped arguments.
136
+ Handles arguments in the following forms:
137
+
138
+ - -r mode
139
+ - --size 3000,3000
140
+ - --bbox=112,2,116,4.5
141
+ - --dst-crs=EPSG:32632
142
+ """
143
+ tokens = shlex.split(step)
144
+ if not tokens:
145
+ return "", []
146
+
147
+ cmd = tokens[0]
148
+ args = []
149
+ i = 1
150
+
151
+ while i < len(tokens):
152
+ token = tokens[i]
153
+
154
+ # Flag that already includes a value (--x=y)
155
+ if token.startswith("-") and "=" in token:
156
+ args.append(token)
157
+ i += 1
158
+
159
+ # Flag that may consume the next token
160
+ elif token.startswith("-"):
161
+ if i + 1 < len(tokens) and not tokens[i + 1].startswith("-"):
162
+ args.append(f"{token} {tokens[i + 1]}")
163
+ i += 2
164
+ else:
165
+ args.append(token)
166
+ i += 1
167
+
168
+ else:
169
+ args.append(token)
170
+ i += 1
171
+
172
+ return cmd, args
173
+
174
+
175
+ def step_label_html(cmd, args):
176
+ """
177
+ Create an HTML-like Graphviz label for a node
178
+ """
179
+
180
+ # add the command as the title
181
+ rows = [f'<TR><TD BGCOLOR="#cfe2ff" ALIGN="CENTER"><B>{cmd}</B></TD></TR>']
182
+
183
+ # add the arguments in the table below
184
+ for arg in args:
185
+ rows.append(f'<TR><TD ALIGN="LEFT">{arg}</TD></TR>')
186
+
187
+ # wrap everything in a <TABLE>
188
+ # the outer < > are required for Graphviz HTML labels
189
+ return f"""<
190
+ <TABLE BORDER="0" CELLBORDER="1" CELLSPACING="0" CELLPADDING="6">
191
+ {''.join(rows)}
192
+ </TABLE>
193
+ >"""
194
+
195
+
196
+ def workflow_diagram(
197
+ steps, output_format: str, pipeline_type=None, title: str = "GDALG Workflow"
198
+ ):
199
+ g = Digraph(
200
+ name=title,
201
+ format=output_format,
202
+ graph_attr={"rankdir": "LR"}, # Left - Right "TB" Top - Bottom
203
+ )
204
+
205
+ g.attr(
206
+ "node",
207
+ shape="plain", # required for HTML labels
208
+ fontname="Helvetica",
209
+ )
210
+
211
+ node_counter = [0]
212
+ # parse steps recursively first
213
+ step_dicts = [parse_step_recursive(s) for s in steps]
214
+
215
+ # add all nodes recursively
216
+ last_ids = [] # keeps track of the last nodes at the top level
217
+
218
+ for sd in step_dicts:
219
+ last_ids = add_step_node(
220
+ g,
221
+ sd,
222
+ parent_ids=last_ids or [None],
223
+ node_counter=node_counter,
224
+ pipeline_type=pipeline_type,
225
+ )
226
+
227
+ return g
228
+
229
+
230
+ def tokenize(text: str):
231
+ tokens = text.lstrip().split()
232
+ lowered = [t.lower() for t in tokens]
233
+ return tokens, lowered
234
+
235
+
236
+ def strip_prefix(text: str) -> str:
237
+ """
238
+ Remove any leading GDAL pipeline prefix
239
+ """
240
+ tokens, lowered = tokenize(text)
241
+
242
+ prefixes = [
243
+ ["gdal", "vector", "pipeline"],
244
+ ["gdal", "raster", "pipeline"],
245
+ ["gdal", "pipeline"],
246
+ ]
247
+
248
+ for prefix in prefixes:
249
+ if lowered[: len(prefix)] == prefix:
250
+ return " ".join(tokens[len(prefix) :])
251
+
252
+ return text
253
+
254
+
255
+ def detect_pipeline_type(text: str) -> Optional[str]:
256
+ """
257
+ Return 'raster' or 'vector' if the second word matches.
258
+ Otherwise return None.
259
+ """
260
+ _, lowered = tokenize(text)
261
+
262
+ if len(lowered) >= 2 and lowered[1] in ("raster", "vector"):
263
+ return lowered[1]
264
+
265
+ return None
266
+
267
+
268
+ def split_pipeline(command_line: str) -> List[str]:
269
+ """
270
+ Split a GDAL pipeline command_line into steps, handling nested brackets.
271
+ Returns a list where nested pipelines are represented as sublists.
272
+ """
273
+
274
+ command_line = strip_prefix(command_line)
275
+
276
+ steps = []
277
+ current = ""
278
+ stack = [] # track open brackets
279
+ i = 0
280
+ while i < len(command_line):
281
+ c = command_line[i]
282
+
283
+ if c == "[":
284
+ stack.append("[")
285
+ current += c
286
+ elif c == "]":
287
+ stack.pop()
288
+ current += c
289
+ elif c == "!" and not stack:
290
+ # end of step at this level
291
+ step = current.strip()
292
+ if step:
293
+ steps.append(step)
294
+ current = ""
295
+ else:
296
+ current += c
297
+ i += 1
298
+
299
+ if current.strip():
300
+ steps.append(current.strip())
301
+
302
+ return steps
303
+
304
+
305
+ def generate_diagram(pipeline: str, output_fn: str):
306
+ """
307
+ Generate a workflow diagram from a GDAL pipeline command line and save
308
+ it to the specified output file.
309
+ """
310
+
311
+ output_format = get_output_format(output_fn, VALID_FORMATS)
312
+ pipeline_type = detect_pipeline_type(pipeline)
313
+ steps = split_pipeline(pipeline)
314
+
315
+ diagram = workflow_diagram(steps, output_format, pipeline_type)
316
+ # remove extension or it gets added twice
317
+ output_file = Path(output_fn)
318
+ output_stem = output_file.with_suffix("")
319
+ diagram.render(output_stem, cleanup=True)
320
+
321
+
322
+ if __name__ == "__main__":
323
+ pipeline = "gdal vector pipeline ! read in.tif ! reproject --dst-crs=EPSG:32632 ! select --fields fid,geom"
324
+ output_fn = "./examples/raster.svg"
325
+ generate_diagram(pipeline, output_fn)
326
+ print("Done!")
@@ -0,0 +1,88 @@
1
+ Metadata-Version: 2.4
2
+ Name: gdalgviz
3
+ Version: 0.1.0
4
+ Summary: CLI tool for visualizing GDALG workflows
5
+ Author: Seth Girvin
6
+ License: MIT
7
+ Requires-Python: >=3.9
8
+ Description-Content-Type: text/markdown
9
+ License-File: LICENSE
10
+ Requires-Dist: graphviz>=0.20
11
+ Provides-Extra: dev
12
+ Requires-Dist: pytest; extra == "dev"
13
+ Requires-Dist: black; extra == "dev"
14
+ Requires-Dist: mypy; extra == "dev"
15
+ Requires-Dist: ruff; extra == "dev"
16
+ Dynamic: license-file
17
+
18
+ # gdalgviz
19
+
20
+ A Python library to visualise [GDAL pipelines](https://gdal.org/en/latest/programs/gdal_pipeline.html).
21
+
22
+ ## Installation
23
+
24
+ Requires [graphviz](https://graphviz.org/download/) to be installed on the system, and
25
+ has a dependency on the [graphviz](https://pypi.org/project/graphviz/) Python package.
26
+
27
+ On Windows:
28
+
29
+ ```powershell
30
+ $GVIZ_PATH = "C:\Program Files\Graphviz\bin"
31
+ $env:PATH = "$GVIZ_PATH;$env:PATH"
32
+ pip install gdalgviz
33
+ gdalgviz --version
34
+ ```
35
+
36
+ ## Usage
37
+
38
+ Passing a pipeline as a JSON file ([tee.json](./examples/tee.json)):
39
+
40
+ ```bash
41
+ gdalgviz ./examples/tee.json ./examples/tee.svg
42
+ ```
43
+
44
+ ![Workflow Diagram](./examples/tee.svg)
45
+
46
+ Passing a pipeline as a string:
47
+
48
+ ```bash
49
+ gdalgviz --pipeline "gdal vector pipeline ! read in.tif ! reproject --dst-crs=EPSG:32632 ! select --fields fid,geom" ./examples/pipeline.svg
50
+ ```
51
+
52
+ ![Workflow Diagram](./examples/pipeline.svg)
53
+
54
+ - Handles both JSON and text input. See [JSON Schema](./examples/gdalg.schema.json)
55
+ - Supports [nested pipelines](https://gdal.org/en/latest/programs/gdal_pipeline.html#nested-pipeline). These
56
+ allow sub-pipelines to be run in parallel and merged later.
57
+ - Supports [tee](https://gdal.org/en/latest/programs/gdal_pipeline.html#output-nested-pipeline) -
58
+ the operation is named "tee" because it splits the stream, like the letter "T": one input, multiple outputs,
59
+ and allows saving of intermediate results
60
+
61
+ This library does not execute the GDAL pipeline, it only visualizes it. The actual execution of the pipeline is done by GDAL itself.
62
+
63
+ ```python
64
+ from osgeo import gdal
65
+
66
+ gdal.UseExceptions()
67
+ with gdal.alg.pipeline(pipeline="read byte.tif ! reproject --dst-crs EPSG:4326 --resampling cubic") as alg:
68
+ ds = alg.Output()
69
+ ```
70
+
71
+ ## Development
72
+
73
+ ```powershell
74
+ pip install -e .[dev]
75
+ black .
76
+ ruff check . --fix
77
+ # mypy .
78
+ pytest tests
79
+ gdalgviz ./examples/tee.json ./examples/tee.svg
80
+ gdalgviz --pipeline "gdal vector pipeline ! read in.tif ! reproject --dst-crs=EPSG:32632 ! select --fields fid,geom" ./examples/pipeline.svg
81
+
82
+ ```
83
+
84
+ ## RoadMap
85
+
86
+ - Add JSON schema validation
87
+ - Add colour coding of the graph depending on if the command is raster, vector etc.
88
+ - Add types to the codebase
@@ -0,0 +1,15 @@
1
+ LICENSE
2
+ README.md
3
+ pyproject.toml
4
+ gdalgviz/__init__.py
5
+ gdalgviz/cli.py
6
+ gdalgviz/commands.py
7
+ gdalgviz/gdalgviz.py
8
+ gdalgviz.egg-info/PKG-INFO
9
+ gdalgviz.egg-info/SOURCES.txt
10
+ gdalgviz.egg-info/dependency_links.txt
11
+ gdalgviz.egg-info/entry_points.txt
12
+ gdalgviz.egg-info/requires.txt
13
+ gdalgviz.egg-info/top_level.txt
14
+ tests/test_cli.py
15
+ tests/test_parser.py
@@ -0,0 +1,2 @@
1
+ [console_scripts]
2
+ gdalgviz = gdalgviz.cli:main
@@ -0,0 +1,7 @@
1
+ graphviz>=0.20
2
+
3
+ [dev]
4
+ pytest
5
+ black
6
+ mypy
7
+ ruff
@@ -0,0 +1 @@
1
+ gdalgviz
@@ -0,0 +1,45 @@
1
+ [build-system]
2
+ requires = ["setuptools>=61"]
3
+ build-backend = "setuptools.build_meta"
4
+
5
+ [project]
6
+ name = "gdalgviz"
7
+ version = "0.1.0"
8
+ description = "CLI tool for visualizing GDALG workflows"
9
+ readme = "README.md"
10
+ requires-python = ">=3.9"
11
+ license = { text = "MIT" }
12
+ authors = [{ name = "Seth Girvin" }]
13
+ dependencies = [
14
+ "graphviz>=0.20",
15
+ ]
16
+
17
+ [project.scripts]
18
+ gdalgviz = "gdalgviz.cli:main"
19
+
20
+ [project.optional-dependencies]
21
+ dev = [
22
+ "pytest",
23
+ "black",
24
+ "mypy",
25
+ "ruff",
26
+ ]
27
+
28
+ [tool.black]
29
+ line-length = 88
30
+
31
+
32
+ exclude = '''
33
+ /(
34
+ \.venv
35
+ | \.git
36
+ | build
37
+ | dist
38
+ )/
39
+ '''
40
+
41
+ [tool.mypy]
42
+ check_untyped_defs = true
43
+ disallow_untyped_defs = true
44
+ ignore_missing_imports = true
45
+ exclude = '(\.venv|__pycache__|build|dist|tests)'
@@ -0,0 +1,4 @@
1
+ [egg_info]
2
+ tag_build =
3
+ tag_date = 0
4
+
@@ -0,0 +1,64 @@
1
+ from unittest.mock import patch
2
+
3
+ from gdalgviz import cli # assuming cli.py is inside gdalgviz/ folder
4
+
5
+
6
+ def test_main_with_pipeline_string(tmp_path):
7
+ """
8
+ Test passing a raw pipeline string via --pipeline
9
+ """
10
+ output_file = tmp_path / "output.svg"
11
+ pipeline_str = "gdal_translate input.tif output.tif"
12
+
13
+ with patch("gdalgviz.cli.generate_diagram") as mock_generate:
14
+ mock_generate.return_value = 0
15
+
16
+ argv = ["--pipeline", pipeline_str, str(output_file)]
17
+ exit_code = cli.main(argv)
18
+
19
+ # Check that the CLI returned the mocked exit code
20
+ assert exit_code == 0
21
+
22
+ # Check that generate_diagram was called with the correct args
23
+ mock_generate.assert_called_once_with(
24
+ pipeline=pipeline_str, output_fn=str(output_file)
25
+ )
26
+
27
+
28
+ def test_main_with_missing_file(tmp_path):
29
+ """
30
+ Test that CLI returns an error code when a file does not exist
31
+ """
32
+ output_file = tmp_path / "output.svg"
33
+ missing_file = tmp_path / "does_not_exist.txt"
34
+
35
+ with patch("gdalgviz.cli.generate_diagram") as mock_generate:
36
+ argv = [str(missing_file), str(output_file)]
37
+ exit_code = cli.main(argv)
38
+
39
+ # Should return 1 because the input file does not exist
40
+ assert exit_code == 1
41
+ # generate_diagram should not be called
42
+ mock_generate.assert_not_called()
43
+
44
+
45
+ def test_main_with_file(tmp_path):
46
+ """
47
+ Test passing a real text file as input
48
+ """
49
+ input_file = tmp_path / "pipeline.txt"
50
+ output_file = tmp_path / "output.svg"
51
+ pipeline_content = "gdal_translate input.tif output.tif"
52
+
53
+ input_file.write_text(pipeline_content)
54
+
55
+ with patch("gdalgviz.cli.generate_diagram") as mock_generate:
56
+ mock_generate.return_value = 0
57
+
58
+ argv = [str(input_file), str(output_file)]
59
+ exit_code = cli.main(argv)
60
+
61
+ assert exit_code == 0
62
+ mock_generate.assert_called_once_with(
63
+ pipeline=pipeline_content, output_fn=str(output_file)
64
+ )
@@ -0,0 +1,97 @@
1
+ from gdalgviz.gdalgviz import split_pipeline, parse_step, parse_step_recursive
2
+
3
+
4
+ def test_split_pipeline():
5
+ res = split_pipeline(
6
+ "gdal vector pipeline ! read in.tif ! reproject --dst-crs=EPSG:32632 ! select --fields fid,geom"
7
+ )
8
+ assert len(res) == 3
9
+
10
+ res = split_pipeline(
11
+ "gdal vector pipeline read in.tif ! reproject --dst-crs=EPSG:32632 ! select --fields fid,geom"
12
+ )
13
+ assert len(res) == 3
14
+
15
+ res = split_pipeline(
16
+ "GDAL vector pipeline read in.tif ! reproject --dst-crs=EPSG:32632 ! select --fields fid,geom"
17
+ )
18
+ assert len(res) == 3
19
+
20
+ res = split_pipeline(
21
+ "GDAL pipeline read in.tif ! reproject --dst-crs=EPSG:32632 ! select --fields fid,geom"
22
+ )
23
+ assert len(res) == 3
24
+
25
+ res = split_pipeline(
26
+ "gdal raster pipeline read in.tif ! reproject --dst-crs=EPSG:32632 ! select --fields fid,geom"
27
+ )
28
+ assert len(res) == 3
29
+
30
+ res = split_pipeline(
31
+ "gdal raster pipeline ! read in.tif ! reproject --dst-crs=EPSG:32632 ! select --fields fid,geom"
32
+ )
33
+ assert len(res) == 3
34
+
35
+
36
+ def test_parse_step():
37
+ res = parse_step("read in.tif")
38
+ assert len(res) == 2
39
+ assert res[0] == "read"
40
+ assert len(res[1]) == 1 and res[1][0] == "in.tif"
41
+
42
+ res = parse_step(
43
+ "reproject -r mode -d EPSG:4326 --bbox=112,2,116,4.5 --bbox-crs=EPSG:4326 --size 3000,3000"
44
+ )
45
+
46
+ assert len(res) == 2
47
+ assert len(res[1]) == 5
48
+
49
+
50
+ def test_parse_step_recursive():
51
+
52
+ steps = split_pipeline("""gdal raster pipeline
53
+ ! read n43.tif
54
+ ! color-map --color-map color_file.txt
55
+ ! tee
56
+ [ write colored.tif --overwrite ]
57
+ ! blend --operator=hsv-value --overlay
58
+ [
59
+ read n43.tif
60
+ ! hillshade -z 30
61
+ ! tee
62
+ [
63
+ write hillshade.tif --overwrite
64
+ ]
65
+ ]
66
+ ! write colored-hillshade.tif --overwrite
67
+ """)
68
+
69
+ step_dicts = [parse_step_recursive(s) for s in steps]
70
+
71
+ for d in step_dicts:
72
+ for k, v in d.items():
73
+ print(k, v)
74
+
75
+ """
76
+ step read n43.tif
77
+ step color-map --color-map color_file.txt
78
+ step tee
79
+ nested [{'step': 'write colored.tif --overwrite'}]
80
+ step blend --operator=hsv-value --overlay
81
+ nested [{'step': 'read n43.tif'}, {'step': 'hillshade -z 30'}, {'step': 'tee', 'nested': [{'step': 'write hillshade.tif --overwrite'}]}]
82
+ step write colored-hillshade.tif --overwrite
83
+ """
84
+
85
+ assert len(step_dicts) == 5
86
+ assert len(step_dicts[0]) == 1
87
+ assert len(step_dicts[1]) == 1
88
+ assert len(step_dicts[2]) == 2
89
+ assert len(step_dicts[3]) == 2
90
+ assert len(step_dicts[4]) == 1
91
+
92
+
93
+ if __name__ == "__main__":
94
+ test_split_pipeline()
95
+ test_parse_step()
96
+ test_parse_step_recursive()
97
+ print("Done!")