sdgis-cli 1.0.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,191 @@
1
+ Metadata-Version: 2.4
2
+ Name: sdgis-cli
3
+ Version: 1.0.0
4
+ Summary: CLI for the San Diego Regional Data Warehouse (SANDAG/SanGIS)
5
+ Author: Your Name
6
+ License: MIT
7
+ Classifier: Development Status :: 4 - Beta
8
+ Classifier: Environment :: Console
9
+ Classifier: Intended Audience :: Science/Research
10
+ Classifier: Topic :: Scientific/Engineering :: GIS
11
+ Classifier: License :: OSI Approved :: MIT License
12
+ Classifier: Programming Language :: Python :: 3
13
+ Requires-Python: >=3.8
14
+ Description-Content-Type: text/markdown
15
+ Requires-Dist: click>=8.0
16
+ Requires-Dist: requests>=2.28
17
+ Requires-Dist: rich>=13.0
18
+ Provides-Extra: embed
19
+ Requires-Dist: sentence-transformers>=2.0; extra == "embed"
20
+ Requires-Dist: numpy>=1.20; extra == "embed"
21
+ Provides-Extra: map
22
+ Requires-Dist: staticmap>=0.5; extra == "map"
23
+ Provides-Extra: all
24
+ Requires-Dist: sentence-transformers>=2.0; extra == "all"
25
+ Requires-Dist: numpy>=1.20; extra == "all"
26
+ Requires-Dist: staticmap>=0.5; extra == "all"
27
+ Dynamic: author
28
+ Dynamic: classifier
29
+ Dynamic: description
30
+ Dynamic: description-content-type
31
+ Dynamic: license
32
+ Dynamic: provides-extra
33
+ Dynamic: requires-dist
34
+ Dynamic: requires-python
35
+ Dynamic: summary
36
+
37
+ # sdgis — San Diego Regional Data Warehouse CLI
38
+
39
+ A command-line tool for exploring, querying, and downloading **360+ GIS datasets** from the [San Diego Regional Data Warehouse](https://geo.sandag.org) maintained by SANDAG and SanGIS.
40
+
41
+ ## Why use this?
42
+
43
+ The SANDAG data warehouse is one of the most comprehensive public GIS repositories for San Diego County — but it's locked behind a web portal and ArcGIS REST APIs that are painful to work with directly. This CLI makes that data scriptable.
44
+
45
+ **Use it if you want to:**
46
+
47
+ - **Research or analyze San Diego** — parcels, zoning, census tracts, bike infrastructure, fire stations, hydrology, affordable housing, business licenses, broadband coverage, and much more
48
+ - **Feed data to an AI agent** — all commands output clean JSON to stdout, status goes to stderr, making it easy to pipe into LLM workflows
49
+ - **Script data pipelines** — pull live feature data with SQL-style filters, bounding boxes, and pagination; pipe directly to `jq`, `ogr2ogr`, or files
50
+ - **Explore what's available** — semantic search across 360 datasets lets you find relevant data without knowing exact dataset names
51
+
52
+ ## Installation
53
+
54
+ ```bash
55
+ pip install -e .
56
+
57
+ # For semantic search (recommended):
58
+ pip install sentence-transformers numpy
59
+ ```
60
+
61
+ ## Setup (first time)
62
+
63
+ Build the local search index. Downloads the dataset catalog and computes embeddings (~22MB model, takes ~30s):
64
+
65
+ ```bash
66
+ sdgis index
67
+ ```
68
+
69
+ ## Quick Start
70
+
71
+ ```bash
72
+ # Semantic search — find relevant datasets without knowing exact names
73
+ sdgis search "bike infrastructure"
74
+ sdgis search "water and flooding"
75
+ sdgis search "affordable housing near transit"
76
+
77
+ # Understand a dataset before querying it (great for agents)
78
+ sdgis describe Bikeways
79
+
80
+ # Count features (with optional filter)
81
+ sdgis count Bikeways
82
+ sdgis count ABC_Licenses --where "LICENSE_TYPE='21'"
83
+
84
+ # Query features
85
+ sdgis query Bikeways --limit 5
86
+ sdgis query Bikeways --where "RD_NAME='Coast Blvd'" --fields "RD_NAME,CLASS"
87
+ sdgis query ABC_Licenses --bbox "-117.2,32.7,-117.1,32.8" --limit 50
88
+
89
+ # Output as JSON or CSV
90
+ sdgis query Bikeways --limit 100 -f json
91
+ sdgis query Bikeways --limit 100 -f csv > bikeways.csv
92
+ sdgis query Bikeways --limit 100 -f geojson > bikeways.geojson
93
+
94
+ # Fetch ALL features with automatic pagination
95
+ sdgis query-all Bikeways -f geojson > all_bikeways.geojson
96
+
97
+ # Download pre-built exports
98
+ sdgis download Bikeways -f shapefile
99
+ ```
100
+
101
+ ## Commands
102
+
103
+ | Command | Description |
104
+ |---------|-------------|
105
+ | `index` | Build local SQLite index with semantic embeddings |
106
+ | `search <query>` | Semantic / FTS / fuzzy search across all datasets |
107
+ | `describe <dataset>` | Schema + feature count + sample rows as JSON (agent-friendly) |
108
+ | `list` | List all available datasets |
109
+ | `info <dataset>` | Show schema, fields, metadata, and links |
110
+ | `fields <dataset>` | List all fields with types and domains |
111
+ | `head <dataset>` | Quick preview: 3 rows + schema summary |
112
+ | `count <dataset>` | Count total features (supports WHERE clause) |
113
+ | `query <dataset>` | Query features with filters, pagination, bounding box |
114
+ | `query-all <dataset>` | Fetch all features with automatic pagination |
115
+ | `sql <dataset> <where>` | Shorthand for WHERE clause queries |
116
+ | `download <dataset>` | Download pre-built GeoJSON / CSV / Shapefile / FGDB |
117
+ | `url <dataset>` | Generate REST, portal, or download URLs |
118
+ | `categories` | List the 18 dataset categories |
119
+
120
+ ## For AI Agents
121
+
122
+ Every command that returns data outputs **clean JSON to stdout** with no ANSI codes. Status messages go to stderr. This makes it easy to use with any LLM tool framework.
123
+
124
+ Typical agent workflow:
125
+
126
+ ```bash
127
+ # 1. Find relevant datasets
128
+ sdgis search "stormwater infrastructure" --json-output
129
+
130
+ # 2. Understand a dataset's schema and sample data in one call
131
+ sdgis describe Hydrological_Basins
132
+
133
+ # 3. Count matching features before pulling all data
134
+ sdgis count Hydrological_Basins --where "AREA_SQMI > 10" --json-output
135
+
136
+ # 4. Pull the data
137
+ sdgis query Hydrological_Basins --where "AREA_SQMI > 10" -f geojson
138
+ ```
139
+
140
+ ## Dataset Categories
141
+
142
+ Agriculture, Business, Census, Community, District, Ecology & Parks, Elevation,
143
+ Fire, Health & Public Safety, Hydrology & Geology, Jurisdiction, Landbase,
144
+ Land Use, Miscellaneous, Place, Transportation, Utilities, Zoning
145
+
146
+ ## Output Formats
147
+
148
+ - **table** — Rich formatted terminal table (default, human-readable)
149
+ - **json** — Raw ArcGIS JSON response
150
+ - **geojson** — Standard GeoJSON FeatureCollection
151
+ - **csv** — Comma-separated values (attributes only)
152
+
153
+ ## Spatial Queries
154
+
155
+ Filter by bounding box (WGS84 lon/lat):
156
+
157
+ ```bash
158
+ sdgis query ABC_Licenses --bbox "-117.2,32.7,-117.1,32.8" --limit 100 -f geojson
159
+ ```
160
+
161
+ ## Piping & Scripting
162
+
163
+ ```bash
164
+ # Count features in every transportation dataset
165
+ sdgis search transportation --json-output | \
166
+ jq -r '.[].name' | \
167
+ while read ds; do
168
+ echo -n "$ds: "
169
+ sdgis count "$ds" --json-output 2>/dev/null
170
+ done
171
+
172
+ # Convert to GeoPackage with ogr2ogr
173
+ sdgis query-all Bikeways -f geojson | ogr2ogr -f "GPKG" bikeways.gpkg /vsistdin/
174
+ ```
175
+
176
+ ## About the Data Warehouse
177
+
178
+ SanGIS and SANDAG have partnered to provide the San Diego region with a single authoritative source of GIS data through the **San Diego Regional Data Warehouse**. It contains hundreds of layers across 18 categories, collected from multiple sources including the City of San Diego, the County of San Diego, the State of California, and the federal government — all free for public use.
179
+
180
+ Datasets cover everything from addresses to zoning: roads/freeways, property and city boundaries, census areas, community planning areas, lakes, streams, business zones, and much more. Data is available as hosted feature services (for interactive viewing and metadata review) and as downloads in FileGDB, Shapefile, CSV, GeoJSON, and JSON formats.
181
+
182
+ > **Note:** Per California Assembly Bill AB1785, SanGIS no longer publishes parcel owner name and address information in publicly accessible online locations. For parcel owner data or technical issues, contact [webmaster@sangis.org](mailto:webmaster@sangis.org).
183
+
184
+ Data is provided for convenience with no warranty as to accuracy. Users should review the [SanGIS Legal Notice](https://www.sangis.org/legal-notices) and [SANDAG Privacy Policy](https://www.sandag.org/privacy-policy) prior to use.
185
+
186
+ ## Data Source
187
+
188
+ All data comes from the **San Diego Regional Data Warehouse** operated by SANDAG (San Diego Association of Governments) and SanGIS.
189
+
190
+ - Portal: https://geo.sandag.org
191
+ - REST Services: https://geo.sandag.org/server/rest/services/Hosted
@@ -0,0 +1,155 @@
1
+ # sdgis — San Diego Regional Data Warehouse CLI
2
+
3
+ A command-line tool for exploring, querying, and downloading **360+ GIS datasets** from the [San Diego Regional Data Warehouse](https://geo.sandag.org) maintained by SANDAG and SanGIS.
4
+
5
+ ## Why use this?
6
+
7
+ The SANDAG data warehouse is one of the most comprehensive public GIS repositories for San Diego County — but it's locked behind a web portal and ArcGIS REST APIs that are painful to work with directly. This CLI makes that data scriptable.
8
+
9
+ **Use it if you want to:**
10
+
11
+ - **Research or analyze San Diego** — parcels, zoning, census tracts, bike infrastructure, fire stations, hydrology, affordable housing, business licenses, broadband coverage, and much more
12
+ - **Feed data to an AI agent** — all commands output clean JSON to stdout, status goes to stderr, making it easy to pipe into LLM workflows
13
+ - **Script data pipelines** — pull live feature data with SQL-style filters, bounding boxes, and pagination; pipe directly to `jq`, `ogr2ogr`, or files
14
+ - **Explore what's available** — semantic search across 360 datasets lets you find relevant data without knowing exact dataset names
15
+
16
+ ## Installation
17
+
18
+ ```bash
19
+ pip install -e .
20
+
21
+ # For semantic search (recommended):
22
+ pip install sentence-transformers numpy
23
+ ```
24
+
25
+ ## Setup (first time)
26
+
27
+ Build the local search index. Downloads the dataset catalog and computes embeddings (~22MB model, takes ~30s):
28
+
29
+ ```bash
30
+ sdgis index
31
+ ```
32
+
33
+ ## Quick Start
34
+
35
+ ```bash
36
+ # Semantic search — find relevant datasets without knowing exact names
37
+ sdgis search "bike infrastructure"
38
+ sdgis search "water and flooding"
39
+ sdgis search "affordable housing near transit"
40
+
41
+ # Understand a dataset before querying it (great for agents)
42
+ sdgis describe Bikeways
43
+
44
+ # Count features (with optional filter)
45
+ sdgis count Bikeways
46
+ sdgis count ABC_Licenses --where "LICENSE_TYPE='21'"
47
+
48
+ # Query features
49
+ sdgis query Bikeways --limit 5
50
+ sdgis query Bikeways --where "RD_NAME='Coast Blvd'" --fields "RD_NAME,CLASS"
51
+ sdgis query ABC_Licenses --bbox "-117.2,32.7,-117.1,32.8" --limit 50
52
+
53
+ # Output as JSON or CSV
54
+ sdgis query Bikeways --limit 100 -f json
55
+ sdgis query Bikeways --limit 100 -f csv > bikeways.csv
56
+ sdgis query Bikeways --limit 100 -f geojson > bikeways.geojson
57
+
58
+ # Fetch ALL features with automatic pagination
59
+ sdgis query-all Bikeways -f geojson > all_bikeways.geojson
60
+
61
+ # Download pre-built exports
62
+ sdgis download Bikeways -f shapefile
63
+ ```
64
+
65
+ ## Commands
66
+
67
+ | Command | Description |
68
+ |---------|-------------|
69
+ | `index` | Build local SQLite index with semantic embeddings |
70
+ | `search <query>` | Semantic / FTS / fuzzy search across all datasets |
71
+ | `describe <dataset>` | Schema + feature count + sample rows as JSON (agent-friendly) |
72
+ | `list` | List all available datasets |
73
+ | `info <dataset>` | Show schema, fields, metadata, and links |
74
+ | `fields <dataset>` | List all fields with types and domains |
75
+ | `head <dataset>` | Quick preview: 3 rows + schema summary |
76
+ | `count <dataset>` | Count total features (supports WHERE clause) |
77
+ | `query <dataset>` | Query features with filters, pagination, bounding box |
78
+ | `query-all <dataset>` | Fetch all features with automatic pagination |
79
+ | `sql <dataset> <where>` | Shorthand for WHERE clause queries |
80
+ | `download <dataset>` | Download pre-built GeoJSON / CSV / Shapefile / FGDB |
81
+ | `url <dataset>` | Generate REST, portal, or download URLs |
82
+ | `categories` | List the 18 dataset categories |
83
+
84
+ ## For AI Agents
85
+
86
+ Every command that returns data outputs **clean JSON to stdout** with no ANSI codes. Status messages go to stderr. This makes it easy to use with any LLM tool framework.
87
+
88
+ Typical agent workflow:
89
+
90
+ ```bash
91
+ # 1. Find relevant datasets
92
+ sdgis search "stormwater infrastructure" --json-output
93
+
94
+ # 2. Understand a dataset's schema and sample data in one call
95
+ sdgis describe Hydrological_Basins
96
+
97
+ # 3. Count matching features before pulling all data
98
+ sdgis count Hydrological_Basins --where "AREA_SQMI > 10" --json-output
99
+
100
+ # 4. Pull the data
101
+ sdgis query Hydrological_Basins --where "AREA_SQMI > 10" -f geojson
102
+ ```
103
+
104
+ ## Dataset Categories
105
+
106
+ Agriculture, Business, Census, Community, District, Ecology & Parks, Elevation,
107
+ Fire, Health & Public Safety, Hydrology & Geology, Jurisdiction, Landbase,
108
+ Land Use, Miscellaneous, Place, Transportation, Utilities, Zoning
109
+
110
+ ## Output Formats
111
+
112
+ - **table** — Rich formatted terminal table (default, human-readable)
113
+ - **json** — Raw ArcGIS JSON response
114
+ - **geojson** — Standard GeoJSON FeatureCollection
115
+ - **csv** — Comma-separated values (attributes only)
116
+
117
+ ## Spatial Queries
118
+
119
+ Filter by bounding box (WGS84 lon/lat):
120
+
121
+ ```bash
122
+ sdgis query ABC_Licenses --bbox "-117.2,32.7,-117.1,32.8" --limit 100 -f geojson
123
+ ```
124
+
125
+ ## Piping & Scripting
126
+
127
+ ```bash
128
+ # Count features in every transportation dataset
129
+ sdgis search transportation --json-output | \
130
+ jq -r '.[].name' | \
131
+ while read ds; do
132
+ echo -n "$ds: "
133
+ sdgis count "$ds" --json-output 2>/dev/null
134
+ done
135
+
136
+ # Convert to GeoPackage with ogr2ogr
137
+ sdgis query-all Bikeways -f geojson | ogr2ogr -f "GPKG" bikeways.gpkg /vsistdin/
138
+ ```
139
+
140
+ ## About the Data Warehouse
141
+
142
+ SanGIS and SANDAG have partnered to provide the San Diego region with a single authoritative source of GIS data through the **San Diego Regional Data Warehouse**. It contains hundreds of layers across 18 categories, collected from multiple sources including the City of San Diego, the County of San Diego, the State of California, and the federal government — all free for public use.
143
+
144
+ Datasets cover everything from addresses to zoning: roads/freeways, property and city boundaries, census areas, community planning areas, lakes, streams, business zones, and much more. Data is available as hosted feature services (for interactive viewing and metadata review) and as downloads in FileGDB, Shapefile, CSV, GeoJSON, and JSON formats.
145
+
146
+ > **Note:** Per California Assembly Bill AB1785, SanGIS no longer publishes parcel owner name and address information in publicly accessible online locations. For parcel owner data or technical issues, contact [webmaster@sangis.org](mailto:webmaster@sangis.org).
147
+
148
+ Data is provided for convenience with no warranty as to accuracy. Users should review the [SanGIS Legal Notice](https://www.sangis.org/legal-notices) and [SANDAG Privacy Policy](https://www.sandag.org/privacy-policy) prior to use.
149
+
150
+ ## Data Source
151
+
152
+ All data comes from the **San Diego Regional Data Warehouse** operated by SANDAG (San Diego Association of Governments) and SanGIS.
153
+
154
+ - Portal: https://geo.sandag.org
155
+ - REST Services: https://geo.sandag.org/server/rest/services/Hosted