geoparquet-extractor 0.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/LICENSE +24 -0
- package/README.md +222 -0
- package/dist/compressors-BMIjN4b7.js +7322 -0
- package/dist/compressors-BMIjN4b7.js.map +1 -0
- package/dist/geoparquet-extractor.js +3273 -0
- package/dist/geoparquet-extractor.js.map +1 -0
- package/dist/gpkg_worker.js +3108 -0
- package/dist/gpkg_worker.js.map +1 -0
- package/dist/types/duckdb_adapter.d.ts +63 -0
- package/dist/types/extent_data.d.ts +73 -0
- package/dist/types/extractor.d.ts +165 -0
- package/dist/types/formats/base.d.ts +141 -0
- package/dist/types/formats/csv.d.ts +14 -0
- package/dist/types/formats/dxf.d.ts +19 -0
- package/dist/types/formats/dxf_writer.d.ts +26 -0
- package/dist/types/formats/geojson.d.ts +17 -0
- package/dist/types/formats/geopackage.d.ts +22 -0
- package/dist/types/formats/geoparquet.d.ts +17 -0
- package/dist/types/formats/kml.d.ts +14 -0
- package/dist/types/formats/kml_writer.d.ts +4 -0
- package/dist/types/formats/shapefile.d.ts +14 -0
- package/dist/types/formats/shp_writer.d.ts +60 -0
- package/dist/types/index.d.ts +14 -0
- package/dist/types/metadata/default.d.ts +45 -0
- package/dist/types/metadata/provider.d.ts +84 -0
- package/dist/types/utils.d.ts +91 -0
- package/dist/wa-sqlite-async.wasm +0 -0
- package/package.json +68 -0
package/LICENSE
ADDED
|
@@ -0,0 +1,24 @@
|
|
|
1
|
+
This is free and unencumbered software released into the public domain.
|
|
2
|
+
|
|
3
|
+
Anyone is free to copy, modify, publish, use, compile, sell, or
|
|
4
|
+
distribute this software, either in source code form or as a compiled
|
|
5
|
+
binary, for any purpose, commercial or non-commercial, and by any
|
|
6
|
+
means.
|
|
7
|
+
|
|
8
|
+
In jurisdictions that recognize copyright laws, the author or authors
|
|
9
|
+
of this software dedicate any and all copyright interest in the
|
|
10
|
+
software to the public domain. We make this dedication for the benefit
|
|
11
|
+
of the public at large and to the detriment of our heirs and
|
|
12
|
+
successors. We intend this dedication to be an overt act of
|
|
13
|
+
relinquishment in perpetuity of all present and future rights to this
|
|
14
|
+
software under copyright law.
|
|
15
|
+
|
|
16
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
|
|
17
|
+
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
|
|
18
|
+
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
|
|
19
|
+
IN NO EVENT SHALL THE AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR
|
|
20
|
+
OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
|
|
21
|
+
ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
|
|
22
|
+
OTHER DEALINGS IN THE SOFTWARE.
|
|
23
|
+
|
|
24
|
+
For more information, please refer to <https://unlicense.org>
|
package/README.md
ADDED
|
@@ -0,0 +1,222 @@
|
|
|
1
|
+
# geoparquet-extractor
|
|
2
|
+
|
|
3
|
+
[](https://www.npmjs.com/package/geoparquet-extractor)
|
|
4
|
+
[](https://github.com/ramSeraph/geoparquet_extractor/releases/latest)
|
|
5
|
+
|
|
6
|
+
Extract and convert spatial data from remote GeoParquet files in the browser. Supports bbox filtering, multiple output formats, and pluggable metadata providers.
|
|
7
|
+
|
|
8
|
+
> **Browser-only** — requires Origin Private File System (OPFS), Web Workers, and Web Locks APIs.
|
|
9
|
+
|
|
10
|
+
## Installation
|
|
11
|
+
|
|
12
|
+
```bash
|
|
13
|
+
npm install geoparquet-extractor
|
|
14
|
+
```
|
|
15
|
+
|
|
16
|
+
## Quick Start
|
|
17
|
+
|
|
18
|
+
```javascript
|
|
19
|
+
import { GeoParquetExtractor, createDuckDBClient } from 'geoparquet-extractor';
|
|
20
|
+
|
|
21
|
+
// You initialize DuckDB yourself
|
|
22
|
+
import * as duckdb from 'duckdb-wasm-opfs-tempdir';
|
|
23
|
+
|
|
24
|
+
const db = /* your initialized AsyncDuckDB instance */;
|
|
25
|
+
const client = await createDuckDBClient(db, {
|
|
26
|
+
extensions: ['spatial', 'httpfs'],
|
|
27
|
+
});
|
|
28
|
+
|
|
29
|
+
const extractor = new GeoParquetExtractor({ duckdb: client });
|
|
30
|
+
|
|
31
|
+
await extractor.extract({
|
|
32
|
+
urls: ['https://example.com/data.parquet'],
|
|
33
|
+
bbox: [77.5, 12.9, 77.7, 13.1],
|
|
34
|
+
format: 'geoparquet',
|
|
35
|
+
baseName: 'my-data',
|
|
36
|
+
onProgress: (pct) => console.log(`${pct}%`),
|
|
37
|
+
onStatus: (msg) => console.log(msg),
|
|
38
|
+
});
|
|
39
|
+
```
|
|
40
|
+
|
|
41
|
+
## Features
|
|
42
|
+
|
|
43
|
+
- **9 output formats**: GeoParquet (v1.1 & v2.0), GeoPackage, Shapefile, CSV, GeoJSON, GeoJSONSeq, KML, DXF
|
|
44
|
+
- **Spatial filtering**: Bbox intersection with per-partition and per-row-group optimization
|
|
45
|
+
- **Pluggable metadata**: Override how partition URLs and bboxes are resolved
|
|
46
|
+
- **Extent visualization data**: Fetch partition/row-group bboxes as GeoJSON for map display
|
|
47
|
+
- **DuckDB-powered**: Spatial SQL queries via DuckDB WASM (you provide the instance)
|
|
48
|
+
- **Self-contained GeoPackage worker**: wa-sqlite-rtree bundled into the worker — no CDN needed
|
|
49
|
+
|
|
50
|
+
## Formats
|
|
51
|
+
|
|
52
|
+
| Format | Value | Extension | Notes |
|
|
53
|
+
|--------|-------|-----------|-------|
|
|
54
|
+
| GeoPackage | `geopackage` | `.gpkg` | Requires GeoPackage worker |
|
|
55
|
+
| GeoJSON | `geojson` | `.geojson` | FeatureCollection |
|
|
56
|
+
| GeoJSONSeq | `geojsonseq` | `.geojsonl` | Newline-delimited |
|
|
57
|
+
| GeoParquet v1.1 | `geoparquet` | `.parquet` | With Hilbert spatial sort |
|
|
58
|
+
| GeoParquet v2.0 | `geoparquet2` | `.parquet` | Native geometry encoding |
|
|
59
|
+
| CSV | `csv` | `.csv` | WKT geometry column |
|
|
60
|
+
| Shapefile | `shapefile` | `.shp` | 2 GB limit per component |
|
|
61
|
+
| KML | `kml` | `.kml` | XML format |
|
|
62
|
+
| DXF | `dxf` | `.dxf` | AutoCAD R14, UTM projection |
|
|
63
|
+
|
|
64
|
+
## DuckDB Setup
|
|
65
|
+
|
|
66
|
+
The library does NOT bundle DuckDB WASM. You initialize it yourself and pass it in:
|
|
67
|
+
|
|
68
|
+
```javascript
|
|
69
|
+
import { createDuckDBClient } from 'geoparquet-extractor';
|
|
70
|
+
import * as duckdb from 'duckdb-wasm-opfs-tempdir';
|
|
71
|
+
|
|
72
|
+
// Standard duckdb-wasm-opfs-tempdir init
|
|
73
|
+
const MANUAL_BUNDLES = { /* your bundle config */ };
|
|
74
|
+
const bundle = await duckdb.selectBundle(MANUAL_BUNDLES);
|
|
75
|
+
const worker = new Worker(bundle.mainWorker);
|
|
76
|
+
const logger = new duckdb.ConsoleLogger();
|
|
77
|
+
const db = new duckdb.AsyncDuckDB(logger, worker);
|
|
78
|
+
await db.instantiate(bundle.mainModule, bundle.pthreadWorker);
|
|
79
|
+
|
|
80
|
+
// Wrap it for the library
|
|
81
|
+
const client = await createDuckDBClient(db, {
|
|
82
|
+
extensions: ['spatial', 'httpfs'],
|
|
83
|
+
});
|
|
84
|
+
```
|
|
85
|
+
|
|
86
|
+
### Custom DuckDB Builds
|
|
87
|
+
|
|
88
|
+
The `duckdb-wasm-opfs-tempdir` package supports `SET temp_directory = 'opfs://...'` for large downloads that exceed browser memory limits. The library's `createDuckDBClient` adapter works with any DuckDB WASM build that provides `AsyncDuckDB`.
|
|
89
|
+
|
|
90
|
+
## GeoPackage Worker
|
|
91
|
+
|
|
92
|
+
The GeoPackage format requires a Web Worker for wa-sqlite. The library ships a self-contained worker with wa-sqlite-rtree bundled in:
|
|
93
|
+
|
|
94
|
+
```javascript
|
|
95
|
+
// Option 1: URL to hosted worker
|
|
96
|
+
const extractor = new GeoParquetExtractor({
|
|
97
|
+
duckdb: client,
|
|
98
|
+
gpkgWorkerUrl: '/workers/gpkg_worker.js',
|
|
99
|
+
});
|
|
100
|
+
|
|
101
|
+
// Option 2: Worker instance (import.meta.url resolves to dist/gpkg_worker.js)
|
|
102
|
+
const worker = new Worker(new URL('geoparquet-extractor/gpkg-worker', import.meta.url), { type: 'module' });
|
|
103
|
+
const extractor = new GeoParquetExtractor({
|
|
104
|
+
duckdb: client,
|
|
105
|
+
gpkgWorker: worker,
|
|
106
|
+
});
|
|
107
|
+
```
|
|
108
|
+
|
|
109
|
+
> **Note**: The worker requires `wa-sqlite-async.wasm` to be served from the same directory as `gpkg_worker.js`. Both files are included in the `dist/` directory.
|
|
110
|
+
|
|
111
|
+
## Custom Metadata Provider
|
|
112
|
+
|
|
113
|
+
Override how partition URLs and bboxes are resolved:
|
|
114
|
+
|
|
115
|
+
```javascript
|
|
116
|
+
import { MetadataProvider, GeoParquetExtractor } from 'geoparquet-extractor';
|
|
117
|
+
|
|
118
|
+
class MyMetadataProvider extends MetadataProvider {
|
|
119
|
+
async getParquetUrls(sourceUrl) {
|
|
120
|
+
const meta = await fetch(sourceUrl + '.meta.json').then(r => r.json());
|
|
121
|
+
const baseUrl = sourceUrl.replace(/[^/]+$/, '');
|
|
122
|
+
return Object.keys(meta.extents).map(f => baseUrl + f);
|
|
123
|
+
}
|
|
124
|
+
|
|
125
|
+
async getExtents(sourceUrl) {
|
|
126
|
+
const meta = await fetch(sourceUrl + '.meta.json').then(r => r.json());
|
|
127
|
+
return meta.extents; // { "file.parquet": [minx, miny, maxx, maxy] }
|
|
128
|
+
}
|
|
129
|
+
|
|
130
|
+
async getBbox(sourceUrl, duckdb) {
|
|
131
|
+
const extents = await this.getExtents(sourceUrl);
|
|
132
|
+
// Compute overall bbox from all partition extents
|
|
133
|
+
let bbox = [Infinity, Infinity, -Infinity, -Infinity];
|
|
134
|
+
for (const ext of Object.values(extents)) {
|
|
135
|
+
bbox[0] = Math.min(bbox[0], ext[0]);
|
|
136
|
+
bbox[1] = Math.min(bbox[1], ext[1]);
|
|
137
|
+
bbox[2] = Math.max(bbox[2], ext[2]);
|
|
138
|
+
bbox[3] = Math.max(bbox[3], ext[3]);
|
|
139
|
+
}
|
|
140
|
+
return bbox;
|
|
141
|
+
}
|
|
142
|
+
}
|
|
143
|
+
|
|
144
|
+
const extractor = new GeoParquetExtractor({
|
|
145
|
+
duckdb: client,
|
|
146
|
+
metadataProvider: new MyMetadataProvider(),
|
|
147
|
+
});
|
|
148
|
+
```
|
|
149
|
+
|
|
150
|
+
## Extent Visualization
|
|
151
|
+
|
|
152
|
+
Fetch partition and row-group bboxes as GeoJSON for map display:
|
|
153
|
+
|
|
154
|
+
```javascript
|
|
155
|
+
import { ExtentData, DefaultMetadataProvider } from 'geoparquet-extractor';
|
|
156
|
+
|
|
157
|
+
const extentData = new ExtentData({
|
|
158
|
+
metadataProvider: new DefaultMetadataProvider(),
|
|
159
|
+
duckdb: client,
|
|
160
|
+
});
|
|
161
|
+
|
|
162
|
+
const { dataExtents, rgExtents } = await extentData.fetchExtents({
|
|
163
|
+
sourceUrl: 'https://example.com/data.mosaic.json',
|
|
164
|
+
partitioned: true,
|
|
165
|
+
});
|
|
166
|
+
|
|
167
|
+
// Convert to GeoJSON for map rendering
|
|
168
|
+
const { polygons, labelPoints } = extentData.toGeoJSON(dataExtents);
|
|
169
|
+
// polygons: FeatureCollection of bbox rectangles
|
|
170
|
+
// labelPoints: FeatureCollection of label anchor points
|
|
171
|
+
```
|
|
172
|
+
|
|
173
|
+
## API
|
|
174
|
+
|
|
175
|
+
### `GeoParquetExtractor`
|
|
176
|
+
|
|
177
|
+
Main orchestrator class.
|
|
178
|
+
|
|
179
|
+
- `constructor({ duckdb, metadataProvider?, gpkgWorkerUrl?, gpkgWorker?, memoryLimitMB? })`
|
|
180
|
+
- `prepare(options)` → Returns format handler for inspection before download
|
|
181
|
+
- `download(handler, { baseName, onProgress?, onStatus? })` → Execute download
|
|
182
|
+
- `extract(options)` → Convenience: prepare + download in one call
|
|
183
|
+
- `cancel()` → Cancel in-flight download
|
|
184
|
+
- `static cleanupOrphanedFiles()` → Clean up OPFS files from dead sessions
|
|
185
|
+
- `static getDownloadBaseName(sourceName, bbox)` → Generate suggested filename
|
|
186
|
+
|
|
187
|
+
### `ExtentData`
|
|
188
|
+
|
|
189
|
+
Data-fetching for partition/row-group bboxes.
|
|
190
|
+
|
|
191
|
+
- `constructor({ metadataProvider, duckdb? })`
|
|
192
|
+
- `fetchExtents({ sourceUrl, partitioned?, includeRowGroups?, onStatus? })` → `{ dataExtents, rgExtents }`
|
|
193
|
+
- `toGeoJSON(extents)` → `{ polygons, labelPoints }`
|
|
194
|
+
|
|
195
|
+
### `MetadataProvider`
|
|
196
|
+
|
|
197
|
+
Abstract base class. Override to customize metadata resolution.
|
|
198
|
+
|
|
199
|
+
- `getParquetUrls(sourceUrl)` → `string[]`
|
|
200
|
+
- `getExtents(sourceUrl)` → `{ filename: [minx, miny, maxx, maxy] }`
|
|
201
|
+
- `getBbox(sourceUrl, duckdb)` → `[minx, miny, maxx, maxy]`
|
|
202
|
+
- `getRowGroupBboxes(parquetUrl, duckdb)` → `{ rg_N: bbox }`
|
|
203
|
+
- `getRowGroupBboxesMulti(urls, duckdb)` → `{ filename: { rg_N: bbox } }`
|
|
204
|
+
|
|
205
|
+
### `createDuckDBClient(db, options?)`
|
|
206
|
+
|
|
207
|
+
Wraps an `AsyncDuckDB` instance into the library's DuckDBClient interface.
|
|
208
|
+
|
|
209
|
+
## CORS Proxy
|
|
210
|
+
|
|
211
|
+
If your parquet files need a CORS proxy:
|
|
212
|
+
|
|
213
|
+
```javascript
|
|
214
|
+
import { setProxyUrl } from 'geoparquet-extractor';
|
|
215
|
+
|
|
216
|
+
// Set a custom proxy URL transformer
|
|
217
|
+
setProxyUrl((url) => `/proxy?url=${encodeURIComponent(url)}`);
|
|
218
|
+
```
|
|
219
|
+
|
|
220
|
+
## License
|
|
221
|
+
|
|
222
|
+
[Unlicense](LICENSE) — public domain.
|