s3-querier 1.1.1 → 1.1.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/package.json
CHANGED
|
@@ -25,6 +25,29 @@ Location tokens — override endpoint or bucket per path:
|
|
|
25
25
|
Glob syntax — wildcard matching for non-time path segments:
|
|
26
26
|
jobs/window=202308032130/*.parquet
|
|
27
27
|
|
|
28
|
+
QUERYING TIME-PARTITIONED DATA OVER A RANGE
|
|
29
|
+
|
|
30
|
+
When data is partitioned by time and you need multiple hours, days, or months, use date
|
|
31
|
+
tokens in the SQL with `from`/`to` as separate parameters. ONE query with tokens downloads
|
|
32
|
+
all matching files across the range — do not make multiple tool calls with hardcoded dates.
|
|
33
|
+
|
|
34
|
+
✗ WRONG — three separate tool calls with hardcoded hours:
|
|
35
|
+
sql: SELECT * FROM read_parquet('events/year=2026/month=06/day=15/hour=12/data.parquet')
|
|
36
|
+
sql: SELECT * FROM read_parquet('events/year=2026/month=06/day=15/hour=13/data.parquet')
|
|
37
|
+
sql: SELECT * FROM read_parquet('events/year=2026/month=06/day=15/hour=14/data.parquet')
|
|
38
|
+
|
|
39
|
+
✓ CORRECT — one tool call, tokens expand across all hours in the range:
|
|
40
|
+
sql: SELECT * FROM read_parquet('events/year={yyyy}/month={MM}/day={dd}/hour={hh}/data.parquet', union_by_name=1)
|
|
41
|
+
from: 2026-06-15T12:00:00Z
|
|
42
|
+
to: 2026-06-15T14:59:59Z
|
|
43
|
+
|
|
44
|
+
Tokens also expand inside the filename, not just in path directory segments:
|
|
45
|
+
data/year={yyyy}/month={MM}/day={dd}/hour={hh}/file_{yyyy}{MM}{dd}{hh}00.parquet
|
|
46
|
+
→ s3-querier downloads one file per hour in the from/to range.
|
|
47
|
+
|
|
48
|
+
Hardcoding a date is fine for a single known file or a fixed point-in-time lookup where
|
|
49
|
+
you are not spanning multiple time segments.
|
|
50
|
+
|
|
28
51
|
HIVE-PARTITIONED DATA
|
|
29
52
|
|
|
30
53
|
For paths partitioned only by year and month (no day segment), use {yyyy} and {MM} together:
|
|
@@ -41,6 +64,11 @@ EXAMPLES
|
|
|
41
64
|
Single file:
|
|
42
65
|
SELECT * FROM read_parquet('reports/summary.parquet') LIMIT 10
|
|
43
66
|
|
|
67
|
+
Hour-partitioned files — tokens in path and filename (requires from/to):
|
|
68
|
+
SELECT * FROM read_parquet(
|
|
69
|
+
'events/year={yyyy}/month={MM}/day={dd}/hour={hh}/file_{yyyy}{MM}{dd}{hh}00.parquet',
|
|
70
|
+
union_by_name=1)
|
|
71
|
+
|
|
44
72
|
Day-partitioned files (requires from/to):
|
|
45
73
|
SELECT id FROM read_parquet('events/year={yyyy}/month={MM}/day={dd}/data.parquet', union_by_name=1)
|
|
46
74
|
|