tg_geometry 0.3.1 → 0.3.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +21 -0
- data/README.md +41 -1
- data/benchmark/_support.rb +11 -4
- data/benchmark/distance_memory_accounting.rb +46 -0
- data/benchmark/distance_point_geom.rb +49 -0
- data/benchmark/distance_within_radius.rb +75 -0
- data/docs/BENCHMARKING.md +28 -1
- data/docs/CONCURRENCY.md +8 -0
- data/docs/GEOMETRY_QUERIES.md +42 -0
- data/docs/LIMITATIONS.md +12 -0
- data/ext/tg_geometry/tg_geometry_ext.c +840 -0
- data/lib/tg/geometry/version.rb +1 -1
- data/lib/tg/geometry.rb +40 -0
- data/spec/distance_spec.rb +258 -0
- metadata +11 -6
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: 7cb54eb85cd48036c2f32eb5f95fa3d19cbf994d375bc102b05f2d45dfb1f8ec
|
|
4
|
+
data.tar.gz: c5ff1c066708d8826c125185f34a346e1f1e2bf66f65fb1b1eb13e75bf955aad
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: 981e37eb58fac8eaa20c3fa9604b751c263330e3f3dc2e8614cb1c91e36eb76d31431b4a13b3cbcd44475323d5ca76ec3e58c013d8c16de9d28e2ec890704962
|
|
7
|
+
data.tar.gz: 05db4ca563cd1c22e13d5c27e440845452397f50516cf01dde8675032fb3e12a527ba0fe52ccd32bc1fe313b3f8d083f46c17a54e4e8c316b6e8ce9766c278d5
|
data/CHANGELOG.md
CHANGED
|
@@ -1,5 +1,26 @@
|
|
|
1
1
|
# Changelog
|
|
2
2
|
|
|
3
|
+
## 0.3.2 - unreleased
|
|
4
|
+
|
|
5
|
+
### Added
|
|
6
|
+
|
|
7
|
+
- Added explicit point-to-geometry distance APIs on `TG::Geometry::Geom`:
|
|
8
|
+
`distance_to_lnglat_meters`, `boundary_distance_to_lnglat_meters`,
|
|
9
|
+
`nearest_point_lnglat`, `distance_to_xy`, `boundary_distance_to_xy`, and
|
|
10
|
+
`nearest_point_xy`.
|
|
11
|
+
- Added `TG::Geometry::Index#within_distance_lnglat_meters`,
|
|
12
|
+
`#within_distance_ids_lnglat_meters`, `#within_distance_xy`, and
|
|
13
|
+
`#within_distance_ids_xy` using bbox prefilter plus exact distance filtering.
|
|
14
|
+
- Added distance specs and benchmarks for point-to-geometry and radius queries, including
|
|
15
|
+
tiny-index/full-extent radius cases and distance receiver memory-accounting checks.
|
|
16
|
+
|
|
17
|
+
### Clarified
|
|
18
|
+
|
|
19
|
+
- lng/lat distance is approximate local equirectangular meters for local geofencing,
|
|
20
|
+
not geodesic distance; GeoJSON segments remain straight coordinate segments.
|
|
21
|
+
- The lng/lat metric is raw planar and does not wrap or split at the antimeridian.
|
|
22
|
+
- Distance query methods keep the GVL and do not claim Falcon/Async/Ractor behavior.
|
|
23
|
+
|
|
3
24
|
## 0.3.1 - 28.05.2026
|
|
4
25
|
|
|
5
26
|
### Changed
|
data/README.md
CHANGED
|
@@ -188,6 +188,46 @@ Predicate direction is explicit:
|
|
|
188
188
|
|
|
189
189
|
Results are ids only and preserve insertion order. Duplicate ids remain possible if duplicate ids were inserted.
|
|
190
190
|
|
|
191
|
+
|
|
192
|
+
|
|
193
|
+
## Point-to-geometry distance
|
|
194
|
+
|
|
195
|
+
`TG::Geometry::Geom` exposes explicit point distance APIs. Units are encoded in the method names; there is no `metric:` option and no automatic lng/lat-vs-XY detection.
|
|
196
|
+
|
|
197
|
+
```ruby
|
|
198
|
+
zone.distance_to_lnglat_meters(lng, lat) # => Float, approximate meters
|
|
199
|
+
zone.boundary_distance_to_lnglat_meters(lng, lat) # => Float, approximate meters
|
|
200
|
+
zone.nearest_point_lnglat(lng, lat) # => [lng, lat]
|
|
201
|
+
|
|
202
|
+
zone.distance_to_xy(x, y) # => Float in input coordinate units
|
|
203
|
+
zone.boundary_distance_to_xy(x, y) # => Float in input coordinate units
|
|
204
|
+
zone.nearest_point_xy(x, y) # => [x, y]
|
|
205
|
+
```
|
|
206
|
+
|
|
207
|
+
For areal geometries (`Polygon`, `MultiPolygon`, and areal collection members), `distance_to_*` returns `0.0` for points inside the covered area or on the boundary. Holes are excluded, so a point inside a hole measures to the nearest hole ring. `boundary_distance_to_*` always measures to the nearest boundary/ring/segment; for an interior point it does not return `0.0` merely because the point is covered. `nearest_point_*` returns the nearest boundary point for areal geometries, including interior queries.
|
|
208
|
+
|
|
209
|
+
Distance methods for lng/lat geometries return approximate meters using a per-query local equirectangular frame. Segments are GeoJSON straight coordinate segments, not great-circle arcs. This is geofencing-grade metric distance, not geodesy. Accuracy is intended for local geofencing and degrades with latitude separation.
|
|
210
|
+
|
|
211
|
+
The lng/lat metric is raw planar lng/lat and does not wrap longitude at `+/-180`. A point at `179.9` and a point at `-179.9` are treated as about `360` degrees apart, matching the gem's planar `covers_xy?` model. Data that crosses the antimeridian should be cut at `+/-180` before import.
|
|
212
|
+
|
|
213
|
+
`Index` supports radius filters with an rtree bbox prefilter followed by exact distance filtering:
|
|
214
|
+
|
|
215
|
+
```ruby
|
|
216
|
+
index.within_distance_lnglat_meters(lng, lat, radius_m, sort: false)
|
|
217
|
+
# => [[id, distance_m], ...]
|
|
218
|
+
index.within_distance_ids_lnglat_meters(lng, lat, radius_m)
|
|
219
|
+
# => [id, ...]
|
|
220
|
+
|
|
221
|
+
index.within_distance_xy(x, y, radius, sort: false)
|
|
222
|
+
# => [[id, distance], ...]
|
|
223
|
+
index.within_distance_ids_xy(x, y, radius)
|
|
224
|
+
# => [id, ...]
|
|
225
|
+
```
|
|
226
|
+
|
|
227
|
+
`sort: true` sorts filtered `[id, distance]` pairs by ascending distance. The ids variants intentionally do not accept `sort:`. Index kNN / `nearest_ids` is not implemented.
|
|
228
|
+
|
|
229
|
+
Distance radius benchmarks compare `rtree prefilter + exact filter` against a brute-force full index scan. Any ratio from those benchmarks is a prefilter-vs-full-scan result, not a claim that the exact distance calculation itself is hundreds of times faster. The benchmark suite includes tiny-index/full-extent cases where the rtree prefilter may be neutral or slower.
|
|
230
|
+
|
|
191
231
|
## GeoJSON FeatureSource
|
|
192
232
|
|
|
193
233
|
`TG::Geometry::FeatureSource` reads GeoJSON `FeatureCollection` sources without `JSON.parse` of the whole document into Ruby Hash/Array objects.
|
|
@@ -345,7 +385,7 @@ Not included:
|
|
|
345
385
|
- Z/M variants of array constructors;
|
|
346
386
|
- Public `release_gvl:` option.
|
|
347
387
|
|
|
348
|
-
TG works in planar XY coordinates. If lon/lat coordinates are passed in, length, area, perimeter, and nearest-segment distances are in input coordinate units, not meters.
|
|
388
|
+
TG works in planar XY coordinates. If lon/lat coordinates are passed in, length, area, perimeter, and low-level nearest-segment distances are in input coordinate units, not meters. The explicit `*_lnglat_meters` point-to-geometry APIs are the exception: they return approximate local meters using the query-local frame documented above, not geodesic meters.
|
|
349
389
|
|
|
350
390
|
## Development
|
|
351
391
|
|
data/benchmark/_support.rb
CHANGED
|
@@ -292,7 +292,7 @@ module TGGeometryBench
|
|
|
292
292
|
:rect_index, :rect,
|
|
293
293
|
:segments, :target_bytes, :payload_bytes,
|
|
294
294
|
:points_per_batch, :threads,
|
|
295
|
-
:entries, :rebuilds, :cycle
|
|
295
|
+
:entries, :rebuilds, :cycle, :receiver
|
|
296
296
|
].freeze
|
|
297
297
|
|
|
298
298
|
TABLE_METRIC_COLUMNS = [
|
|
@@ -302,7 +302,8 @@ module TGGeometryBench
|
|
|
302
302
|
:iterations, :operations,
|
|
303
303
|
:geom_memsize, :flat_memsize, :rtree_memsize, :rtree_over_flat,
|
|
304
304
|
:start_rss_kb, :peak_rss_kb, :finish_rss_kb, :drift_kb, :max_drift_kb,
|
|
305
|
-
:elapsed_sec, :queries, :sample_count, :rss_kb
|
|
305
|
+
:elapsed_sec, :queries, :sample_count, :rss_kb,
|
|
306
|
+
:before_memsize, :after_memsize, :delta_memsize, :allocated_objects
|
|
306
307
|
].freeze
|
|
307
308
|
|
|
308
309
|
TABLE_INTERNAL_COLUMNS = [
|
|
@@ -348,6 +349,7 @@ module TGGeometryBench
|
|
|
348
349
|
rebuilds: "rebuilds",
|
|
349
350
|
queries: "queries",
|
|
350
351
|
cycle: "cycle",
|
|
352
|
+
receiver: "receiver",
|
|
351
353
|
rss_kb: "rss KB",
|
|
352
354
|
geom_memsize: "geom B",
|
|
353
355
|
flat_memsize: "flat B",
|
|
@@ -359,7 +361,11 @@ module TGGeometryBench
|
|
|
359
361
|
drift_kb: "drift KB",
|
|
360
362
|
max_drift_kb: "max drift KB",
|
|
361
363
|
elapsed_sec: "elapsed s",
|
|
362
|
-
sample_count: "samples"
|
|
364
|
+
sample_count: "samples",
|
|
365
|
+
before_memsize: "before B",
|
|
366
|
+
after_memsize: "after B",
|
|
367
|
+
delta_memsize: "delta B",
|
|
368
|
+
allocated_objects: "alloc objs"
|
|
363
369
|
}.freeze
|
|
364
370
|
|
|
365
371
|
def human_int(value)
|
|
@@ -407,7 +413,8 @@ module TGGeometryBench
|
|
|
407
413
|
:rss_kb, :geom_memsize, :flat_memsize, :rtree_memsize, :rtree_over_flat,
|
|
408
414
|
:start_rss_kb, :peak_rss_kb, :finish_rss_kb, :drift_kb, :max_drift_kb,
|
|
409
415
|
:target_bytes, :payload_bytes, :segments, :points_per_batch, :threads,
|
|
410
|
-
:sample_count, :median_minor_gc, :median_major_gc
|
|
416
|
+
:sample_count, :median_minor_gc, :median_major_gc,
|
|
417
|
+
:before_memsize, :after_memsize, :delta_memsize, :allocated_objects
|
|
411
418
|
human_int(value)
|
|
412
419
|
when :full, :adaptive, :gc_disabled
|
|
413
420
|
value ? "yes" : "no"
|
|
@@ -0,0 +1,46 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
require_relative "_support"
|
|
4
|
+
require "objspace"
|
|
5
|
+
|
|
6
|
+
TGGeometryBench.say_header("distance_memory_accounting")
|
|
7
|
+
|
|
8
|
+
geom = TG::Geometry.polygon([[0, 0], [10, 0], [10, 10], [0, 10], [0, 0]])
|
|
9
|
+
index = TG::Geometry::Index.build([[:zone, geom]], via: :geom, strategy: :rtree)
|
|
10
|
+
|
|
11
|
+
CASES = [
|
|
12
|
+
["geom_distance_to_xy", "Geom", geom, -> { geom.distance_to_xy(5, 5) }],
|
|
13
|
+
["geom_boundary_distance_to_xy", "Geom", geom, -> { geom.boundary_distance_to_xy(5, 5) }],
|
|
14
|
+
["geom_nearest_point_xy", "Geom", geom, -> { geom.nearest_point_xy(5, 5) }],
|
|
15
|
+
["geom_distance_to_lnglat_meters", "Geom", geom, -> { geom.distance_to_lnglat_meters(0.01, 0.01) }],
|
|
16
|
+
["index_within_distance_xy", "Index", index, -> { index.within_distance_xy(5, 5, 10) }],
|
|
17
|
+
["index_within_distance_lnglat_meters", "Index", index, -> { index.within_distance_lnglat_meters(0.01, 0.01, 2_000) }]
|
|
18
|
+
].freeze
|
|
19
|
+
|
|
20
|
+
ITERATIONS = TGGeometryBench.env_integer("TGEOMETRY_DISTANCE_MEMORY_ITERATIONS", 10_000, min: 1)
|
|
21
|
+
|
|
22
|
+
CASES.each do |name, receiver_name, receiver, callable|
|
|
23
|
+
TGGeometryBench.gc_start
|
|
24
|
+
before_memsize = ObjectSpace.memsize_of(receiver)
|
|
25
|
+
before_allocated = GC.stat[:total_allocated_objects]
|
|
26
|
+
|
|
27
|
+
ITERATIONS.times { callable.call }
|
|
28
|
+
|
|
29
|
+
TGGeometryBench.gc_start
|
|
30
|
+
after_memsize = ObjectSpace.memsize_of(receiver)
|
|
31
|
+
after_allocated = GC.stat[:total_allocated_objects]
|
|
32
|
+
|
|
33
|
+
TGGeometryBench.report(
|
|
34
|
+
"distance_memory_accounting",
|
|
35
|
+
{
|
|
36
|
+
case: name,
|
|
37
|
+
receiver: receiver_name,
|
|
38
|
+
operations: ITERATIONS,
|
|
39
|
+
before_memsize: before_memsize,
|
|
40
|
+
after_memsize: after_memsize,
|
|
41
|
+
delta_memsize: after_memsize - before_memsize,
|
|
42
|
+
allocated_objects: after_allocated - before_allocated,
|
|
43
|
+
note: "delta B must stay 0; Ruby result allocations are expected for arrays/pairs"
|
|
44
|
+
}
|
|
45
|
+
)
|
|
46
|
+
end
|
|
@@ -0,0 +1,49 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
require_relative "_support"
|
|
4
|
+
|
|
5
|
+
TGGeometryBench.say_header("distance_point_geom")
|
|
6
|
+
|
|
7
|
+
VERTEX_COUNTS = (TGGeometryBench.full? ? [16, 128, 1_024, 8_192] : [16, 128, 1_024]).freeze
|
|
8
|
+
|
|
9
|
+
def regular_polygon(vertex_count, radius: 1.0)
|
|
10
|
+
points = vertex_count.times.map do |i|
|
|
11
|
+
angle = (2.0 * Math::PI * i) / vertex_count
|
|
12
|
+
[Math.cos(angle) * radius, Math.sin(angle) * radius]
|
|
13
|
+
end
|
|
14
|
+
points << points.first
|
|
15
|
+
TG::Geometry.polygon(points)
|
|
16
|
+
end
|
|
17
|
+
|
|
18
|
+
def zigzag_line(vertex_count, scale: 1.0)
|
|
19
|
+
points = vertex_count.times.map { |i| [i.to_f * scale, (i.even? ? 0.0 : 1.0) * scale] }
|
|
20
|
+
TG::Geometry.line_string(points)
|
|
21
|
+
end
|
|
22
|
+
|
|
23
|
+
VERTEX_COUNTS.each do |vertices|
|
|
24
|
+
geometries = {
|
|
25
|
+
polygon_xy: regular_polygon(vertices, radius: 10.0),
|
|
26
|
+
line_xy: zigzag_line(vertices),
|
|
27
|
+
polygon_lnglat_meters: regular_polygon(vertices, radius: 0.01),
|
|
28
|
+
line_lnglat_meters: zigzag_line(vertices, scale: 0.0001)
|
|
29
|
+
}
|
|
30
|
+
|
|
31
|
+
geometries.each do |kind, geom|
|
|
32
|
+
method, args = case kind
|
|
33
|
+
when :polygon_xy then [:distance_to_xy, [20.0, 0.0]]
|
|
34
|
+
when :line_xy then [:distance_to_xy, [vertices / 2.0, 5.0]]
|
|
35
|
+
when :polygon_lnglat_meters then [:distance_to_lnglat_meters, [0.02, 0.0]]
|
|
36
|
+
when :line_lnglat_meters then [:distance_to_lnglat_meters, [vertices / 20_000.0, 0.002]]
|
|
37
|
+
end
|
|
38
|
+
|
|
39
|
+
stats = TGGeometryBench.measure_counted(initial_iterations: TGGeometryBench.initial_iterations(1_000)) do |iter|
|
|
40
|
+
iter.times { geom.public_send(method, *args) }
|
|
41
|
+
end
|
|
42
|
+
|
|
43
|
+
TGGeometryBench.report(
|
|
44
|
+
"distance_point_geom",
|
|
45
|
+
{ kind: kind, n: vertices, method: method, note: "distance methods keep the GVL" },
|
|
46
|
+
stats: stats
|
|
47
|
+
)
|
|
48
|
+
end
|
|
49
|
+
end
|
|
@@ -0,0 +1,75 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
require_relative "_support"
|
|
4
|
+
|
|
5
|
+
TGGeometryBench.say_header("distance_within_radius")
|
|
6
|
+
|
|
7
|
+
COUNTS = (TGGeometryBench.full? ? [1_000, 10_000, 50_000] : [1_000, 10_000]).freeze
|
|
8
|
+
# Contract-required honesty case: tiny index + radius covering the whole data extent.
|
|
9
|
+
# This shows the crossover zone where paying for the rtree prefilter may not help
|
|
10
|
+
# compared to a direct full scan. The result is empirical: on some Ruby/CPU builds
|
|
11
|
+
# rtree may still win, but the benchmark must expose the scenario instead of
|
|
12
|
+
# only showing selective-radius wins.
|
|
13
|
+
TINY_COUNTS = (TGGeometryBench.full? ? [50, 100, 250] : [50, 100]).freeze
|
|
14
|
+
|
|
15
|
+
def box(x, y, size = 0.8)
|
|
16
|
+
TG::Geometry.polygon([[x, y], [x + size, y], [x + size, y + size], [x, y + size], [x, y]])
|
|
17
|
+
end
|
|
18
|
+
|
|
19
|
+
def entries(count)
|
|
20
|
+
count.times.map do |i|
|
|
21
|
+
x = (i % 250).to_f * 2.0
|
|
22
|
+
y = (i / 250).to_f * 2.0
|
|
23
|
+
[i, box(x, y)]
|
|
24
|
+
end
|
|
25
|
+
end
|
|
26
|
+
|
|
27
|
+
def brute_force(entries, x, y, radius)
|
|
28
|
+
entries.filter_map do |id, geom|
|
|
29
|
+
distance = geom.distance_to_xy(x, y)
|
|
30
|
+
[id, distance] if distance <= radius
|
|
31
|
+
end
|
|
32
|
+
end
|
|
33
|
+
|
|
34
|
+
def measure_radius_case(data, index, count, query_name, x, y, radius, initial:)
|
|
35
|
+
[
|
|
36
|
+
[:rtree_prefilter_exact_filter, -> { index.within_distance_xy(x, y, radius) }],
|
|
37
|
+
[:brute_force_full_scan, -> { brute_force(data, x, y, radius) }]
|
|
38
|
+
].each do |method, callable|
|
|
39
|
+
stats = TGGeometryBench.measure_counted(initial_iterations: TGGeometryBench.initial_iterations(initial)) do |iterations|
|
|
40
|
+
iterations.times { callable.call }
|
|
41
|
+
end
|
|
42
|
+
|
|
43
|
+
TGGeometryBench.report(
|
|
44
|
+
"distance_within_radius",
|
|
45
|
+
{ n: count, query: query_name, method: method, note: "prefilter vs full scan; distance radius search keeps the GVL" },
|
|
46
|
+
stats: stats
|
|
47
|
+
)
|
|
48
|
+
end
|
|
49
|
+
end
|
|
50
|
+
|
|
51
|
+
COUNTS.each do |count|
|
|
52
|
+
data = entries(count)
|
|
53
|
+
index = TG::Geometry::Index.build(data, via: :geom, strategy: :rtree)
|
|
54
|
+
|
|
55
|
+
{
|
|
56
|
+
selective: [10.5, 10.5, 1.0, 250],
|
|
57
|
+
broad: [150.0, 20.0, 180.0, 25]
|
|
58
|
+
}.each do |query_name, (x, y, radius, initial)|
|
|
59
|
+
measure_radius_case(data, index, count, query_name, x, y, radius, initial: initial)
|
|
60
|
+
end
|
|
61
|
+
end
|
|
62
|
+
|
|
63
|
+
TINY_COUNTS.each do |count|
|
|
64
|
+
data = entries(count)
|
|
65
|
+
index = TG::Geometry::Index.build(data, via: :geom, strategy: :rtree)
|
|
66
|
+
|
|
67
|
+
# Query at the tiny grid center with a radius that covers the whole generated extent.
|
|
68
|
+
max_x = ((count - 1) % 250).to_f * 2.0 + 0.8
|
|
69
|
+
max_y = ((count - 1) / 250).to_f * 2.0 + 0.8
|
|
70
|
+
x = max_x / 2.0
|
|
71
|
+
y = max_y / 2.0
|
|
72
|
+
radius = Math.hypot(max_x, max_y) + 1.0
|
|
73
|
+
|
|
74
|
+
measure_radius_case(data, index, count, :tiny_full_extent, x, y, radius, initial: 100)
|
|
75
|
+
end
|
data/docs/BENCHMARKING.md
CHANGED
|
@@ -15,6 +15,9 @@ The repository includes these benchmark entry points:
|
|
|
15
15
|
- `benchmark/falcon_concurrency.rb`
|
|
16
16
|
- `benchmark/objectspace_memsize.rb`
|
|
17
17
|
- `benchmark/rss_stability.rb`
|
|
18
|
+
- `benchmark/distance_point_geom.rb`
|
|
19
|
+
- `benchmark/distance_within_radius.rb`
|
|
20
|
+
- `benchmark/distance_memory_accounting.rb`
|
|
18
21
|
|
|
19
22
|
Run after compiling the extension:
|
|
20
23
|
|
|
@@ -42,7 +45,11 @@ Benchmark generators cover:
|
|
|
42
45
|
- flat vs rtree;
|
|
43
46
|
- scalar vs packed batch;
|
|
44
47
|
- parse small/medium/large geometry strings;
|
|
45
|
-
- RSS stability over repeated build/query/free
|
|
48
|
+
- RSS stability over repeated build/query/free;
|
|
49
|
+
- point-to-geometry distance over different vertex counts;
|
|
50
|
+
- radius search as `rtree prefilter + exact filter` versus brute-force full index scan;
|
|
51
|
+
- tiny-index/full-extent radius cases where the rtree prefilter may not help;
|
|
52
|
+
- distance receiver memory accounting before and after repeated calls.
|
|
46
53
|
|
|
47
54
|
## Output format
|
|
48
55
|
|
|
@@ -54,6 +61,26 @@ kind=compact n=1000 query=point lon=0.4 lat=0.4 flat_sec=... rtree_sec=... flat_
|
|
|
54
61
|
|
|
55
62
|
These records are intentionally plain text so they can be redirected to files and compared across machines.
|
|
56
63
|
|
|
64
|
+
|
|
65
|
+
## Distance benchmarks
|
|
66
|
+
|
|
67
|
+
`benchmark/distance_point_geom.rb` measures point-to-geometry distance method cost over geometries with different vertex counts. The `*_lnglat_meters` rows measure the local equirectangular frame overhead; they are not geodesic benchmarks.
|
|
68
|
+
|
|
69
|
+
`benchmark/distance_within_radius.rb` compares two query strategies:
|
|
70
|
+
|
|
71
|
+
- `rtree_prefilter_exact_filter`: existing rtree bbox prefilter followed by exact distance filtering;
|
|
72
|
+
- `brute_force_full_scan`: direct scan over every geometry with the same exact distance method.
|
|
73
|
+
|
|
74
|
+
Any speedup ratio from this script is a prefilter-vs-full-scan result. It must not be described as “distance is N times faster”. The benchmark intentionally includes selective-radius cases where the index should help and tiny-index/full-extent-radius cases where the prefilter may be neutral or slower. If rtree still wins on a machine, document that measured result instead of inventing a crossover.
|
|
75
|
+
|
|
76
|
+
`benchmark/distance_memory_accounting.rb` checks `ObjectSpace.memsize_of` before and after repeated distance calls. The expected receiver `delta B` is `0`; Ruby result allocations are still expected for returned arrays and `[id, distance]` pairs.
|
|
77
|
+
|
|
78
|
+
For noisy rows, especially short selective radius runs, increase timing before publishing numbers:
|
|
79
|
+
|
|
80
|
+
```bash
|
|
81
|
+
TGEOMETRY_BENCH_MIN_SECONDS=1.0 bundle exec ruby benchmark/distance_within_radius.rb
|
|
82
|
+
```
|
|
83
|
+
|
|
57
84
|
## No `:auto` strategy yet
|
|
58
85
|
|
|
59
86
|
The first release does not expose `strategy: :auto`. Choosing a threshold requires project-owned benchmark output across the required scenario matrix. Internal rtree constants such as leaf capacity are not a flat-vs-rtree crossover threshold.
|
data/docs/CONCURRENCY.md
CHANGED
|
@@ -69,3 +69,11 @@ No Falcon or Async performance claim is made. `benchmark/falcon_concurrency.rb`
|
|
|
69
69
|
## Low-level borrowed wrappers
|
|
70
70
|
|
|
71
71
|
`TG::Geometry::Line`, `TG::Geometry::Ring`, `TG::Geometry::Polygon`, and borrowed GeometryCollection child `TG::Geometry::Geom` wrappers are immutable borrowed wrappers. They do not mutate or free child TG pointers. Each wrapper marks and compacts the parent `TG::Geometry::Geom` through `geom_owner`, so the parent native geometry remains alive while a child wrapper is in use.
|
|
72
|
+
|
|
73
|
+
## Distance query concurrency
|
|
74
|
+
|
|
75
|
+
Point-to-geometry distance methods are read-only over immutable `Geom` and `Index` objects. They do not mutate geometry, mutate index entries, cache query state on receivers, or add persistent native memory. They do not call `rb_gc_adjust_memory_usage`.
|
|
76
|
+
|
|
77
|
+
`Index#within_distance_*` uses rtree callbacks only to mark candidate ordinals in C memory. Ruby arrays and `[id, distance]` pairs are materialized after `rtree_search` returns under the GVL. The callback safety rules above apply unchanged.
|
|
78
|
+
|
|
79
|
+
Distance methods deliberately keep the GVL. Do not treat them as no-GVL, Falcon/Async-aware, or Ractor-shareable APIs.
|
data/docs/GEOMETRY_QUERIES.md
CHANGED
|
@@ -21,3 +21,45 @@ It does not affect `intersecting_geom_ids`, `covering_geom_ids`, or `containing_
|
|
|
21
21
|
Results are arrays of ids in insertion order. Duplicate ids are preserved if duplicate ids were inserted.
|
|
22
22
|
|
|
23
23
|
In v0.3.0 these operations run under the GVL. The heavy C phase is structured without Ruby API calls so it can be made no-GVL-safe later after benchmarking, but there is no public `release_gvl:` knob.
|
|
24
|
+
|
|
25
|
+
## Point-to-geometry distance queries
|
|
26
|
+
|
|
27
|
+
Distance APIs are intentionally split by unit/model:
|
|
28
|
+
|
|
29
|
+
| Receiver | Method | Units / result |
|
|
30
|
+
| --- | --- | --- |
|
|
31
|
+
| `Geom` | `distance_to_lnglat_meters(lng, lat)` | `Float`, approximate meters |
|
|
32
|
+
| `Geom` | `boundary_distance_to_lnglat_meters(lng, lat)` | `Float`, approximate meters to nearest boundary/segment/point |
|
|
33
|
+
| `Geom` | `nearest_point_lnglat(lng, lat)` | `[lng, lat]`, raw planar nearest point |
|
|
34
|
+
| `Geom` | `distance_to_xy(x, y)` | `Float`, input coordinate units |
|
|
35
|
+
| `Geom` | `boundary_distance_to_xy(x, y)` | `Float`, input coordinate units |
|
|
36
|
+
| `Geom` | `nearest_point_xy(x, y)` | `[x, y]`, input coordinate units |
|
|
37
|
+
| `Index` | `within_distance_lnglat_meters(lng, lat, radius_m, sort: false)` | `[[id, distance_m], ...]` |
|
|
38
|
+
| `Index` | `within_distance_ids_lnglat_meters(lng, lat, radius_m)` | `[id, ...]` |
|
|
39
|
+
| `Index` | `within_distance_xy(x, y, radius, sort: false)` | `[[id, distance], ...]` |
|
|
40
|
+
| `Index` | `within_distance_ids_xy(x, y, radius)` | `[id, ...]` |
|
|
41
|
+
|
|
42
|
+
Coordinate order is always `(lng, lat)` for `*_lnglat_meters` and `(x, y)` for `*_xy`. There is no `metric:` keyword and no automatic coordinate-system detection.
|
|
43
|
+
|
|
44
|
+
Areal geometry semantics:
|
|
45
|
+
|
|
46
|
+
- `Polygon`, `MultiPolygon`, and areal `GeometryCollection` members return `0.0` from `distance_to_*` when the query point is inside the covered area or on the boundary.
|
|
47
|
+
- Holes are excluded: a point inside a hole measures to the nearest hole-ring segment.
|
|
48
|
+
- `boundary_distance_to_*` always measures to the nearest boundary/ring/segment. For an interior point it does not return `0.0` merely because the point is covered.
|
|
49
|
+
- `nearest_point_*` returns the nearest boundary point for areal types, including interior queries.
|
|
50
|
+
|
|
51
|
+
Non-areal geometry semantics:
|
|
52
|
+
|
|
53
|
+
- `Point` / `MultiPoint` measure to the nearest point.
|
|
54
|
+
- `LineString` / `MultiLineString` measure to the nearest segment.
|
|
55
|
+
- `GeometryCollection` returns the minimum over measurable members; empty members are skipped. A geometry with no measurable component raises `TG::Geometry::ArgumentError`.
|
|
56
|
+
|
|
57
|
+
Distance methods for lng/lat geometries return approximate meters using a per-query local equirectangular frame. Segments are GeoJSON straight coordinate segments, not great-circle arcs. This is geofencing-grade metric distance, not geodesy. Accuracy is intended for local geofencing and degrades with latitude separation.
|
|
58
|
+
|
|
59
|
+
The lng/lat metric is raw planar lng/lat. It does not wrap longitude at `+/-180`, does not split antimeridian boxes, and does not normalize returned `nearest_point_lnglat` longitudes. A geometry near `179.9` and a query near `-179.9` are treated as far apart in raw planar coordinates, consistent with `covers_xy?`. Cut antimeridian-crossing data at `+/-180` before import.
|
|
60
|
+
|
|
61
|
+
`Index#within_distance_*` uses two phases: one bbox prefilter through the existing index path and one exact distance filter over candidates. The returned distance is the exact filter value; it is not discarded and recomputed later. `sort: true` sorts the filtered pair array by ascending distance. The ids-only variants intentionally reject `sort:`.
|
|
62
|
+
|
|
63
|
+
Radius-query benchmarks should be read as `rtree prefilter + exact filter` versus brute-force full index scan. They measure the value of avoiding a full scan for selective radii, not a standalone speedup of the exact distance primitive. Tiny indexes and radii covering the whole data extent are benchmarked separately because the prefilter can become neutral or slower there.
|
|
64
|
+
|
|
65
|
+
No kNN, `nearest_ids`, rtree nearest traversal, projection/reprojection, signed distance, geometry-to-geometry distance, or geodesic distance is implemented by these APIs.
|
data/docs/LIMITATIONS.md
CHANGED
|
@@ -22,3 +22,15 @@ TG works in planar XY coordinates. If lon/lat coordinates are passed in, length,
|
|
|
22
22
|
FeatureSource reads the full source into memory. It avoids a Ruby `JSON.parse` object tree, but it is not a streaming backend. There is no gzip, NDGeoJSON, GeoJSONSeq, or FlatGeobuf support.
|
|
23
23
|
|
|
24
24
|
FeatureSource returns raw properties JSON strings. It does not parse properties into Ruby Hash objects.
|
|
25
|
+
|
|
26
|
+
## Distance limitations
|
|
27
|
+
|
|
28
|
+
`*_lnglat_meters` distance methods are approximate local-meter helpers for geofencing. They use a per-query local equirectangular frame anchored at the query latitude. Segments are GeoJSON straight coordinate segments, not great-circle arcs. Accuracy is intended for local geofencing and degrades with latitude separation. This is not geodesy.
|
|
29
|
+
|
|
30
|
+
The metric is raw planar lng/lat and does not wrap longitude at `+/-180`. Cross-antimeridian proximity is not detected: `179.9` and `-179.9` are treated as roughly `360` degrees apart. This matches the rest of the gem's planar model and `covers_xy?` behavior. Data that legitimately crosses the antimeridian should be cut at `+/-180` before import.
|
|
31
|
+
|
|
32
|
+
Near the poles, longitude scale approaches zero. Distance methods remain finite and avoid NaN/Inf, but the local frame understates longitude distance. Radius queries near poles may therefore include accepted false positives relative to real geodesy; that is an accuracy limitation of the planar metric, not a geodesic guarantee.
|
|
33
|
+
|
|
34
|
+
`*_xy` distance methods never convert units. They return input coordinate units.
|
|
35
|
+
|
|
36
|
+
Distance queries keep the GVL. There is no no-GVL, Falcon, Async, Ractor, projection/reprojection, kNN, `nearest_ids`, signed distance, polygon-to-polygon distance, or geometry-to-geometry distance support in this feature.
|