ul-wukong 4.1.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +15 -0
- data/.gitignore +60 -0
- data/.gitmodules +6 -0
- data/.rspec +2 -0
- data/.travis.yml +19 -0
- data/.yardopts +6 -0
- data/CHANGELOG.md +7 -0
- data/Gemfile +17 -0
- data/Guardfile +12 -0
- data/LICENSE.md +95 -0
- data/NOTES-travis.md +31 -0
- data/README-old.md +422 -0
- data/README.md +1308 -0
- data/Rakefile +28 -0
- data/TODO.md +99 -0
- data/bin/cutc +30 -0
- data/bin/cuttab +5 -0
- data/bin/greptrue +6 -0
- data/bin/md5sort +20 -0
- data/bin/setcat +11 -0
- data/bin/tabchar +5 -0
- data/bin/uniq-ord +59 -0
- data/bin/uniqc +3 -0
- data/bin/wu +34 -0
- data/bin/wu-clean-encoding +31 -0
- data/bin/wu-date +13 -0
- data/bin/wu-datetime +13 -0
- data/bin/wu-hist +3 -0
- data/bin/wu-lign +186 -0
- data/bin/wu-local +4 -0
- data/bin/wu-plus +9 -0
- data/bin/wu-source +5 -0
- data/bin/wu-sum +31 -0
- data/diagrams/wu_local.dot +39 -0
- data/diagrams/wu_local.dot.png +0 -0
- data/examples/Gemfile +38 -0
- data/examples/README.md +9 -0
- data/examples/basic/string_reverser.rb +23 -0
- data/examples/basic/tiny_count.rb +8 -0
- data/examples/basic/word_count/accumulator.rb +26 -0
- data/examples/basic/word_count/tokenizer.rb +13 -0
- data/examples/basic/word_count/word_count.rb +6 -0
- data/examples/dataflow/scraper_macro_flow.rb +28 -0
- data/examples/deploy_pack/Gemfile +6 -0
- data/examples/deploy_pack/README.md +6 -0
- data/examples/deploy_pack/a/b/c/.gitkeep +0 -0
- data/examples/deploy_pack/app/processors/string_reverser.rb +5 -0
- data/examples/deploy_pack/config/environment.rb +1 -0
- data/examples/dsl/dataflow/fibonacci_series.rb +101 -0
- data/examples/dsl/dataflow/scraper_macro_flow.rb +28 -0
- data/examples/dsl/dataflow/simple.rb +12 -0
- data/examples/dsl/dataflow/telegram.rb +45 -0
- data/examples/dsl/workflow/cherry_pie.dot +97 -0
- data/examples/dsl/workflow/cherry_pie.md +104 -0
- data/examples/dsl/workflow/cherry_pie.png +0 -0
- data/examples/dsl/workflow/cherry_pie.rb +101 -0
- data/examples/empty/.gitkeep +0 -0
- data/examples/examples_helper.rb +9 -0
- data/examples/geo.rb +4 -0
- data/examples/geo/geo_grids.numbers +0 -0
- data/examples/geo/geolocated.rb +331 -0
- data/examples/geo/quadtile.rb +69 -0
- data/examples/geo/spec/geolocated_spec.rb +247 -0
- data/examples/geo/tile_fetcher.rb +77 -0
- data/examples/graph/implied_geolocation/README.md +63 -0
- data/examples/graph/minimum_spanning_tree/airfares_graphviz.rb +73 -0
- data/examples/improver/tweet_summary.rb +73 -0
- data/examples/loadable.rb +2 -0
- data/examples/munging/airline_flights/airline_flights.rake +83 -0
- data/examples/munging/airline_flights/airplane.rb +0 -0
- data/examples/munging/airline_flights/airport_id_unification.rb +129 -0
- data/examples/munging/airline_flights/airport_ok_chars.rb +4 -0
- data/examples/munging/airline_flights/indexable.rb +75 -0
- data/examples/munging/airline_flights/indexable_spec.rb +90 -0
- data/examples/munging/airline_flights/reconcile_airports.rb +142 -0
- data/examples/munging/airline_flights/tasks.rake +83 -0
- data/examples/munging/airline_flights/topcities.rb +167 -0
- data/examples/munging/geo/geo_json.rb +54 -0
- data/examples/munging/geo/geo_models.rb +69 -0
- data/examples/munging/geo/geonames_models.rb +107 -0
- data/examples/munging/geo/iso_codes.rb +172 -0
- data/examples/munging/geo/reconcile_countries.rb +124 -0
- data/examples/munging/geo/tasks.rake +71 -0
- data/examples/munging/wikipedia/articles/extract_articles-parsed.rb +79 -0
- data/examples/munging/wikipedia/articles/extract_articles-templated.rb +136 -0
- data/examples/munging/wikipedia/articles/textualize_articles.rb +54 -0
- data/examples/munging/wikipedia/articles/verify_structure.rb +43 -0
- data/examples/munging/wikipedia/articles/wp2txt-LICENSE.txt +22 -0
- data/examples/munging/wikipedia/articles/wp2txt_article.rb +259 -0
- data/examples/munging/wikipedia/articles/wp2txt_utils.rb +452 -0
- data/examples/munging/wikipedia/dbpedia/dbpedia_common.rb +5 -0
- data/examples/munging/wikipedia/dbpedia/dbpedia_extract_geocoordinates.rb +78 -0
- data/examples/munging/wikipedia/dbpedia/extract_links-cruft.rb +66 -0
- data/examples/munging/wikipedia/dbpedia/extract_links.rb +260 -0
- data/examples/munging/wikipedia/dbpedia/sameas_extractor.rb +20 -0
- data/examples/rake_helper.rb +97 -0
- data/examples/ruby_project/Gemfile +6 -0
- data/examples/ruby_project/README.md +6 -0
- data/examples/ruby_project/a/b/c/.gitkeep +0 -0
- data/examples/server_logs/geo_ip_mapping/munge_geolite.rb +82 -0
- data/examples/server_logs/logline.rb +95 -0
- data/examples/server_logs/models.rb +66 -0
- data/examples/server_logs/page_counts.pig +48 -0
- data/examples/server_logs/server_logs-01-parse-script.rb +13 -0
- data/examples/server_logs/server_logs-02-histograms-full.rb +33 -0
- data/examples/server_logs/server_logs-02-histograms-mapper.rb +14 -0
- data/examples/server_logs/server_logs-03-breadcrumbs-full.rb +71 -0
- data/examples/server_logs/server_logs-04-page_page_edges-full.rb +40 -0
- data/examples/serverlogs/geo_ip_mapping/munge_geolite.rb +82 -0
- data/examples/serverlogs/models/logline.rb +102 -0
- data/examples/serverlogs/parser/apache_parser_widget.rb +46 -0
- data/examples/serverlogs/visit_paths/common.rb +4 -0
- data/examples/serverlogs/visit_paths/page_counts.pig +48 -0
- data/examples/serverlogs/visit_paths/serverlogs-01-parse-script.rb +11 -0
- data/examples/serverlogs/visit_paths/serverlogs-02-histograms-full.rb +31 -0
- data/examples/serverlogs/visit_paths/serverlogs-02-histograms-mapper.rb +12 -0
- data/examples/serverlogs/visit_paths/serverlogs-03-breadcrumbs-full.rb +67 -0
- data/examples/serverlogs/visit_paths/serverlogs-04-page_page_edges-full.rb +38 -0
- data/examples/splitter.rb +94 -0
- data/examples/string_reverser.rb +7 -0
- data/examples/text/pig_latin/pig_latinizer.rb +35 -0
- data/examples/text/pig_latin/pig_latinizer_widget.rb +16 -0
- data/examples/text/regional_flavor/README.md +14 -0
- data/examples/text/regional_flavor/article_wordbags.pig +39 -0
- data/examples/text/regional_flavor/j01-article_wordbags.rb +4 -0
- data/examples/text/regional_flavor/simple_pig_script.pig +27 -0
- data/examples/twitter.rb +5 -0
- data/lib/hanuman.rb +36 -0
- data/lib/hanuman/graph.rb +97 -0
- data/lib/hanuman/graphvizzer.rb +206 -0
- data/lib/hanuman/graphvizzer/gv_models.rb +161 -0
- data/lib/hanuman/graphvizzer/gv_presenter.rb +97 -0
- data/lib/hanuman/link.rb +35 -0
- data/lib/hanuman/registry.rb +46 -0
- data/lib/hanuman/stage.rb +128 -0
- data/lib/hanuman/tree.rb +67 -0
- data/lib/wu/geo.rb +4 -0
- data/lib/wu/geo/geo_grids.numbers +0 -0
- data/lib/wu/geo/geolocated.rb +331 -0
- data/lib/wu/geo/quadtile.rb +69 -0
- data/lib/wu/graph/union_find.rb +62 -0
- data/lib/wu/model/reconcilable.rb +63 -0
- data/lib/wu/munging.rb +71 -0
- data/lib/wu/social/models/twitter.rb +31 -0
- data/lib/wu/wikipedia/models.rb +20 -0
- data/lib/wukong.rb +54 -0
- data/lib/wukong/dataflow.rb +43 -0
- data/lib/wukong/doc_helpers.rb +14 -0
- data/lib/wukong/doc_helpers/dataflow_handler.rb +29 -0
- data/lib/wukong/doc_helpers/field_handler.rb +91 -0
- data/lib/wukong/doc_helpers/processor_handler.rb +29 -0
- data/lib/wukong/driver.rb +214 -0
- data/lib/wukong/driver/event_machine_driver.rb +15 -0
- data/lib/wukong/driver/wiring.rb +68 -0
- data/lib/wukong/local.rb +42 -0
- data/lib/wukong/local/runner.rb +96 -0
- data/lib/wukong/local/stdio_driver.rb +104 -0
- data/lib/wukong/logger.rb +102 -0
- data/lib/wukong/model/faker.rb +136 -0
- data/lib/wukong/model/flatpack_parser/flat.rb +60 -0
- data/lib/wukong/model/flatpack_parser/flatpack.rb +4 -0
- data/lib/wukong/model/flatpack_parser/lang.rb +46 -0
- data/lib/wukong/model/flatpack_parser/parser.rb +55 -0
- data/lib/wukong/model/flatpack_parser/tokens.rb +130 -0
- data/lib/wukong/plugin.rb +48 -0
- data/lib/wukong/processor.rb +110 -0
- data/lib/wukong/rake_helper.rb +6 -0
- data/lib/wukong/runner.rb +169 -0
- data/lib/wukong/runner/boot_sequence.rb +123 -0
- data/lib/wukong/runner/code_loader.rb +52 -0
- data/lib/wukong/runner/command_runner.rb +44 -0
- data/lib/wukong/runner/deploy_pack_loader.rb +75 -0
- data/lib/wukong/runner/help_message.rb +42 -0
- data/lib/wukong/source.rb +33 -0
- data/lib/wukong/source/source_driver.rb +74 -0
- data/lib/wukong/source/source_runner.rb +38 -0
- data/lib/wukong/spec_helpers.rb +74 -0
- data/lib/wukong/spec_helpers/integration_tests.rb +150 -0
- data/lib/wukong/spec_helpers/integration_tests/integration_test_matchers.rb +207 -0
- data/lib/wukong/spec_helpers/integration_tests/integration_test_runner.rb +97 -0
- data/lib/wukong/spec_helpers/shared_examples.rb +22 -0
- data/lib/wukong/spec_helpers/unit_tests.rb +135 -0
- data/lib/wukong/spec_helpers/unit_tests/unit_test_driver.rb +132 -0
- data/lib/wukong/spec_helpers/unit_tests/unit_test_matchers.rb +169 -0
- data/lib/wukong/spec_helpers/unit_tests/unit_test_runner.rb +60 -0
- data/lib/wukong/version.rb +3 -0
- data/lib/wukong/widget/echo.rb +55 -0
- data/lib/wukong/widget/extract.rb +122 -0
- data/lib/wukong/widget/filters.rb +452 -0
- data/lib/wukong/widget/logger.rb +56 -0
- data/lib/wukong/widget/operators.rb +82 -0
- data/lib/wukong/widget/reducers.rb +10 -0
- data/lib/wukong/widget/reducers/accumulator.rb +73 -0
- data/lib/wukong/widget/reducers/bin.rb +368 -0
- data/lib/wukong/widget/reducers/count.rb +73 -0
- data/lib/wukong/widget/reducers/group.rb +128 -0
- data/lib/wukong/widget/reducers/group_concat.rb +98 -0
- data/lib/wukong/widget/reducers/improver.rb +71 -0
- data/lib/wukong/widget/reducers/join_xml.rb +37 -0
- data/lib/wukong/widget/reducers/moments.rb +72 -0
- data/lib/wukong/widget/reducers/sort.rb +180 -0
- data/lib/wukong/widget/reducers/uniq.rb +91 -0
- data/lib/wukong/widget/serializers.rb +317 -0
- data/lib/wukong/widget/utils.rb +46 -0
- data/lib/wukong/widgets.rb +7 -0
- data/spec/examples/dataflow/fibonacci_series_spec.rb +18 -0
- data/spec/examples/dataflow/parse_apache_logs_spec.rb +8 -0
- data/spec/examples/dataflow/parsing_spec.rb +14 -0
- data/spec/examples/dataflow/simple_spec.rb +34 -0
- data/spec/examples/dataflow/telegram_spec.rb +43 -0
- data/spec/examples/graph/minimum_spanning_tree_spec.rb +34 -0
- data/spec/examples/munging/airline_flights/identifiers_spec.rb +16 -0
- data/spec/examples/munging/airline_flights_spec.rb +202 -0
- data/spec/examples/text/pig_latin_spec.rb +18 -0
- data/spec/examples/workflow/cherry_pie_spec.rb +36 -0
- data/spec/hanuman/graph_spec.rb +119 -0
- data/spec/hanuman/hanuman_spec.rb +10 -0
- data/spec/hanuman/registry_spec.rb +123 -0
- data/spec/hanuman/stage_spec.rb +81 -0
- data/spec/hanuman/tree_spec.rb +119 -0
- data/spec/spec.opts +1 -0
- data/spec/spec_helper.rb +43 -0
- data/spec/support/example_test_helpers.rb +95 -0
- data/spec/support/hanuman_test_helpers.rb +92 -0
- data/spec/support/integration_helper.rb +38 -0
- data/spec/support/model_test_helpers.rb +115 -0
- data/spec/support/shared_context_for_graphs.rb +57 -0
- data/spec/support/shared_context_for_reducers.rb +37 -0
- data/spec/support/shared_examples_for_builders.rb +94 -0
- data/spec/support/shared_examples_for_shortcuts.rb +57 -0
- data/spec/wu/model/reconcilable_spec.rb +152 -0
- data/spec/wukong/dataflow_spec.rb +87 -0
- data/spec/wukong/driver_spec.rb +154 -0
- data/spec/wukong/local/runner_spec.rb +29 -0
- data/spec/wukong/local/stdio_driver_spec.rb +73 -0
- data/spec/wukong/local_spec.rb +6 -0
- data/spec/wukong/logger_spec.rb +49 -0
- data/spec/wukong/model/faker_spec.rb +132 -0
- data/spec/wukong/processor_spec.rb +21 -0
- data/spec/wukong/runner_spec.rb +132 -0
- data/spec/wukong/source_spec.rb +6 -0
- data/spec/wukong/widget/extract_spec.rb +101 -0
- data/spec/wukong/widget/filters_spec.rb +79 -0
- data/spec/wukong/widget/logger_spec.rb +23 -0
- data/spec/wukong/widget/operators_spec.rb +25 -0
- data/spec/wukong/widget/reducers/bin_spec.rb +92 -0
- data/spec/wukong/widget/reducers/count_spec.rb +11 -0
- data/spec/wukong/widget/reducers/group_spec.rb +21 -0
- data/spec/wukong/widget/reducers/join_xml_spec.rb +25 -0
- data/spec/wukong/widget/reducers/moments_spec.rb +36 -0
- data/spec/wukong/widget/reducers/sort_spec.rb +26 -0
- data/spec/wukong/widget/reducers/uniq_spec.rb +14 -0
- data/spec/wukong/widget/serializers_spec.rb +114 -0
- data/spec/wukong/widget/sink_spec.rb +19 -0
- data/spec/wukong/widget/source_spec.rb +65 -0
- data/spec/wukong/wu-local_spec.rb +109 -0
- data/spec/wukong/wu-source_spec.rb +32 -0
- data/spec/wukong/wu_spec.rb +14 -0
- data/spec/wukong/wukong_spec.rb +10 -0
- data/wukong.gemspec +35 -0
- metadata +465 -0
@@ -0,0 +1,69 @@
|
|
1
|
+
module Wukong
|
2
|
+
module Geo
|
3
|
+
class Quadtile
|
4
|
+
include Gorillib::Model
|
5
|
+
#
|
6
|
+
field :tile_x, Integer, position: 0, doc: "Tile X index, an integer between 0 and 2^zoom_level - 1"
|
7
|
+
field :tile_y, Integer, position: 1, doc: "Tile Y index, an integer between 0 and 2^zoom_level - 1"
|
8
|
+
field :zl, Integer, position: 2, doc: "Zoom level of tile to fetch. 0 is the world; 16 is about a kilometer."
|
9
|
+
field :slug, String, default: 'tile', doc: "Name, prefixed on saved tiles"
|
10
|
+
|
11
|
+
def quadkey ; Wukong::Geolocated.tile_xy_zl_to_quadkey( tile_x, tile_y, zl) ; end
|
12
|
+
def packed_qk ; Wukong::Geolocated.tile_xy_zl_to_packed_qk(tile_x, tile_y, zl) ; end
|
13
|
+
|
14
|
+
# Base of URL for map tile server; anything X/Y/Z.png-addressable works,
|
15
|
+
# eg `http://b.tile.openstreetmap.org`. Defaults to 'http://b.tile.stamen.com/toner-lite'`.
|
16
|
+
class_attribute :tileserver_url_base
|
17
|
+
self.tileserver_url_base = 'http://a.tile.stamen.com/toner-lite'
|
18
|
+
|
19
|
+
def self.from_whatever(hsh)
|
20
|
+
zl = hsh[:zl] ? hsh[:zl].to_i : nil
|
21
|
+
case
|
22
|
+
when hsh[:tile_x].present? && hsh[:tile_y].present? && zl.present?
|
23
|
+
tile_x, tile_y = [hsh[:tile_x], hsh[:tile_y]]
|
24
|
+
when hsh[:longitude].present? && hsh[:latitude].present? && zl.present?
|
25
|
+
tile_x, tile_y = Wukong::Geolocated.lng_lat_zl_to_tile_xy(hsh[:longitude], hsh[:latitude], zl)
|
26
|
+
when hsh[:quadkey].present?
|
27
|
+
quadkey = hsh[:quadkey]
|
28
|
+
quadkey = quadkey[0..zl] if zl.to_i > 0
|
29
|
+
tile_x, tile_y, zl = Wukong::Geolocated.quadkey_to_tile_xy_zl(quadkey)
|
30
|
+
else
|
31
|
+
raise ArgumentError, "You must supply keys for either `:longitude`, `:latitude` and `:zl`; `:tile_x`, `:tile_y` and `:zl`; or `:quadkey`: #{hsh.inspect}"
|
32
|
+
end
|
33
|
+
return new(tile_x, tile_y, zl, hsh.to_hash)
|
34
|
+
end
|
35
|
+
|
36
|
+
def self.tileserver_conn
|
37
|
+
@tileserver_conn = Faraday.new(:url => tileserver_url_base)
|
38
|
+
end
|
39
|
+
|
40
|
+
def tile_url
|
41
|
+
[tileserver_url_base, zl, tile_x, tile_y].join('/') << ".png"
|
42
|
+
end
|
43
|
+
|
44
|
+
# A
|
45
|
+
#
|
46
|
+
# @example
|
47
|
+
# qt = Quadtile.from_whatever(longitude: -97.759003, latitude: 30.273884, zl: 15)
|
48
|
+
# qt.slug # tile-15-64587
|
49
|
+
#
|
50
|
+
#
|
51
|
+
# @returns [String]
|
52
|
+
def basename(options={})
|
53
|
+
options = { sep: '-', ext: 'png'}
|
54
|
+
sep = options[:sep]
|
55
|
+
# "%s%s%02d%s%04d%s%04d.%s" % [slug, sep, zl, sep, tile_x, sep, tile_y, options[:ext]]
|
56
|
+
"%s/%02d/%s%s%s.%s" % [slug, zl, slug, sep, quadkey, options[:ext]]
|
57
|
+
end
|
58
|
+
|
59
|
+
# Fetch the contents of a map tile from a tileserver
|
60
|
+
#
|
61
|
+
# You are responsible for requiring the faraday library and its adapter
|
62
|
+
#
|
63
|
+
def fetch
|
64
|
+
self.class.tileserver_conn.get(tile_url)
|
65
|
+
end
|
66
|
+
|
67
|
+
end
|
68
|
+
end
|
69
|
+
end
|
@@ -0,0 +1,247 @@
|
|
1
|
+
require 'gorillib/data_munging'
|
2
|
+
require_relative '../geolocated'
|
3
|
+
|
4
|
+
describe Wukong::Geolocated do
|
5
|
+
let(:aus_lng){ -97.759003 } # Austin, TX -- infochimps HQ
|
6
|
+
let(:aus_lat){ 30.273884 }
|
7
|
+
let(:sat_lng){ -98.486123 } # San Antonio, TX
|
8
|
+
let(:sat_lat){ 29.42575 }
|
9
|
+
let(:dpi){ 72 }
|
10
|
+
#
|
11
|
+
let(:aus_tile_x_3){ 1.82758 } # zoom level 3
|
12
|
+
let(:aus_tile_y_3){ 3.29356 }
|
13
|
+
let(:aus_pixel_x_3){ 468 }
|
14
|
+
let(:aus_pixel_y_3){ 843 }
|
15
|
+
#
|
16
|
+
let(:aus_tile_x_8){ 58.48248675555555 } # zoom level 8
|
17
|
+
let(:aus_tile_y_8){ 105.39405073699557 }
|
18
|
+
let(:aus_tile_x_11){ 467 } # zoom level 11
|
19
|
+
let(:aus_tile_y_11){ 843 }
|
20
|
+
#
|
21
|
+
let(:aus_quadkey ){ "0231301203311211" }
|
22
|
+
let(:aus_quadkey_3){ "023" }
|
23
|
+
let(:radius){ 1_000_000 } # 1,000 km
|
24
|
+
|
25
|
+
context Wukong::Geolocated::ByCoordinates do
|
26
|
+
let(:point_klass) do
|
27
|
+
module Wukong
|
28
|
+
class TestPoint
|
29
|
+
include Gorillib::Model
|
30
|
+
include Wukong::Geolocated::ByCoordinates
|
31
|
+
field :name, String, position: 0, doc: "Name of this location"
|
32
|
+
field :longitude, Float, position: 1, doc: "Longitude (X) of a point, in decimal degrees"
|
33
|
+
field :latitude, Float, position: 2, doc: "Latitude (Y) of a point, in decimal degrees"
|
34
|
+
end
|
35
|
+
end
|
36
|
+
Wukong::TestPoint
|
37
|
+
end
|
38
|
+
subject{ point_klass.new("Infochimps HQ", aus_lng, aus_lat) }
|
39
|
+
|
40
|
+
context '#tile_xf' do
|
41
|
+
it "tile X coordinate, as a float" do
|
42
|
+
subject.tile_xf(3).should be_within(0.0001).of( 1.82758)
|
43
|
+
subject.tile_xf(8).should be_within(0.0001).of(58.48248)
|
44
|
+
subject.tile_xf(11).should be_within(0.0001).of(467.8598)
|
45
|
+
end
|
46
|
+
end
|
47
|
+
context '#tile_x' do
|
48
|
+
it "tile X coordinate, as an integer" do
|
49
|
+
subject.tile_x(3).should == 1
|
50
|
+
subject.tile_x(8).should == 58
|
51
|
+
subject.tile_x(11).should == 467
|
52
|
+
end
|
53
|
+
end
|
54
|
+
context '#tile_yf' do
|
55
|
+
it "tile Y coordinate, as a float" do
|
56
|
+
subject.tile_yf(3).should be_within(0.0001).of( 3.29356)
|
57
|
+
subject.tile_yf(8).should be_within(0.0001).of(105.394051)
|
58
|
+
subject.tile_yf(11).should be_within(0.0001).of(843.152406)
|
59
|
+
end
|
60
|
+
end
|
61
|
+
context '#tile_x' do
|
62
|
+
it "tile Y coordinate, as an integer" do
|
63
|
+
subject.tile_y(3).should == 3
|
64
|
+
subject.tile_y(8).should == 105
|
65
|
+
subject.tile_y(11).should == 843
|
66
|
+
end
|
67
|
+
end
|
68
|
+
context '#quadkey' do
|
69
|
+
it "a string of 2-bit tile selectors" do
|
70
|
+
subject.quadkey(3).should == "023"
|
71
|
+
subject.quadkey(16).should == "0231301203311211"
|
72
|
+
end
|
73
|
+
end
|
74
|
+
end
|
75
|
+
|
76
|
+
context Wukong::Geolocated do
|
77
|
+
|
78
|
+
it "gives private methods on including class as well as the methods on itself" do
|
79
|
+
klass = Class.new{ include Wukong::Geolocated }
|
80
|
+
klass.should be_private_method_defined(:lng_lat_zl_to_tile_xy)
|
81
|
+
klass.should be_private_method_defined(:haversine_distance)
|
82
|
+
end
|
83
|
+
|
84
|
+
#
|
85
|
+
# Tile coordinates
|
86
|
+
#
|
87
|
+
|
88
|
+
it "returns a map tile size given a zoom level" do
|
89
|
+
Wukong::Geolocated.map_tile_size(3).should == 8
|
90
|
+
end
|
91
|
+
|
92
|
+
it "returns a tile_x, tile_y pair given a longitude, latitude and zoom level" do
|
93
|
+
Wukong::Geolocated.lng_lat_zl_to_tile_xy(aus_lng, aus_lat, 8).should == [ 58, 105]
|
94
|
+
Wukong::Geolocated.lng_lat_zl_to_tile_xy(aus_lng, aus_lat, 11).should == [467, 843]
|
95
|
+
end
|
96
|
+
|
97
|
+
it "returns a longitude, latitude pair given tile_x, tile_y and zoom level" do
|
98
|
+
lng, lat = Wukong::Geolocated.tile_xy_zl_to_lng_lat(aus_tile_x_8, aus_tile_y_8, 8)
|
99
|
+
lng.should be_within(0.0001).of(aus_lng)
|
100
|
+
lat.should be_within(0.0001).of(aus_lat)
|
101
|
+
end
|
102
|
+
|
103
|
+
#
|
104
|
+
# Pixel coordinates
|
105
|
+
#
|
106
|
+
|
107
|
+
it "returns a map pizel size given a zoom level" do
|
108
|
+
Wukong::Geolocated.map_pixel_size(3).should == 2048
|
109
|
+
end
|
110
|
+
|
111
|
+
it "returns a pixel_x, pixel_y pair given a longitude, latitude and zoom level" do
|
112
|
+
Wukong::Geolocated.lng_lat_zl_to_pixel_xy(aus_lng, aus_lat, 3).should == [468, 843]
|
113
|
+
end
|
114
|
+
|
115
|
+
it "returns a longitude, latitude pair given pixel_x, pixel_y and zoom level" do
|
116
|
+
lng, lat = Wukong::Geolocated.pixel_xy_zl_to_lng_lat(aus_pixel_x_3, aus_pixel_y_3, 3)
|
117
|
+
lat.round(4).should == 30.2970
|
118
|
+
lng.round(4).should == -97.7344
|
119
|
+
end
|
120
|
+
|
121
|
+
it "returns a tile x-y pair given a pixel x-y pair" do
|
122
|
+
Wukong::Geolocated.pixel_xy_to_tile_xy(aus_pixel_x_3, aus_pixel_y_3).should == [1,3]
|
123
|
+
end
|
124
|
+
|
125
|
+
it "returns a pixel x-y pair given a float tile x-y pair" do
|
126
|
+
Wukong::Geolocated.tile_xy_to_pixel_xy(aus_tile_x_3, aus_tile_y_3 ).should == [467.86048, 843.15136]
|
127
|
+
end
|
128
|
+
|
129
|
+
it "returns a pixel x-y pair given an integer tile x-y pair" do
|
130
|
+
Wukong::Geolocated.tile_xy_to_pixel_xy(aus_tile_x_3.to_i, aus_tile_y_3.to_i).should == [256, 768]
|
131
|
+
end
|
132
|
+
|
133
|
+
#
|
134
|
+
# Quadkey coordinates
|
135
|
+
#
|
136
|
+
|
137
|
+
it "returns a quadkey given a tile x-y pair and a zoom level" do
|
138
|
+
Wukong::Geolocated.tile_xy_zl_to_quadkey(aus_tile_x_3, aus_tile_y_3, 3).should == "023"
|
139
|
+
Wukong::Geolocated.tile_xy_zl_to_quadkey(aus_tile_x_8, aus_tile_y_8, 8).should == "02313012"
|
140
|
+
Wukong::Geolocated.tile_xy_zl_to_quadkey(aus_tile_x_11, aus_tile_y_11,11).should == "02313012033"
|
141
|
+
end
|
142
|
+
|
143
|
+
it "returns a quadkey given a longitude, latitude and a zoom level" do
|
144
|
+
Wukong::Geolocated.lng_lat_zl_to_quadkey(aus_lng, aus_lat, 3).should == "023"
|
145
|
+
Wukong::Geolocated.lng_lat_zl_to_quadkey(aus_lng, aus_lat, 8).should == "02313012"
|
146
|
+
Wukong::Geolocated.lng_lat_zl_to_quadkey(aus_lng, aus_lat, 11).should == "02313012033"
|
147
|
+
Wukong::Geolocated.lng_lat_zl_to_quadkey(aus_lng, aus_lat, 16).should == "0231301203311211"
|
148
|
+
end
|
149
|
+
|
150
|
+
it "returns a packed quadkey (an integer) given a tile xy and zoom level" do
|
151
|
+
Wukong::Geolocated.tile_xy_zl_to_packed_qk(aus_tile_x_3.floor, aus_tile_y_3.floor, 3).should == "023".to_i(4)
|
152
|
+
Wukong::Geolocated.tile_xy_zl_to_packed_qk(aus_tile_x_8.floor, aus_tile_y_8.floor, 8).should == "02313012".to_i(4)
|
153
|
+
Wukong::Geolocated.tile_xy_zl_to_packed_qk(aus_tile_x_11.floor, aus_tile_y_11.floor,11).should == "02313012033".to_i(4)
|
154
|
+
end
|
155
|
+
|
156
|
+
context '.packed_qk_zl_to_tile_xy' do
|
157
|
+
let(:packed_qk){ "0231301203311211".to_i(4) }
|
158
|
+
it "returns a tile xy given a packed quadkey (integer)" do
|
159
|
+
Wukong::Geolocated.packed_qk_zl_to_tile_xy(packed_qk >> 26, 3).should == [ 1, 3, 3]
|
160
|
+
Wukong::Geolocated.packed_qk_zl_to_tile_xy(packed_qk >> 16, 8).should == [ 58, 105, 8]
|
161
|
+
Wukong::Geolocated.packed_qk_zl_to_tile_xy(packed_qk >> 10, 11).should == [467, 843, 11]
|
162
|
+
end
|
163
|
+
|
164
|
+
it "defaults to zl=16 for packed quadkeys" do
|
165
|
+
Wukong::Geolocated.packed_qk_zl_to_tile_xy(packed_qk ).should == [14971, 26980, 16]
|
166
|
+
Wukong::Geolocated.packed_qk_zl_to_tile_xy(packed_qk, 16).should == [14971, 26980, 16]
|
167
|
+
end
|
168
|
+
end
|
169
|
+
|
170
|
+
it "returns tile x-y pair and a zoom level given a quadkey" do
|
171
|
+
Wukong::Geolocated.quadkey_to_tile_xy_zl(aus_quadkey[0..2] ).should == [1, 3, 3]
|
172
|
+
Wukong::Geolocated.quadkey_to_tile_xy_zl(aus_quadkey[0..7] ).should == [aus_tile_x_8.floor, aus_tile_y_8.floor, 8]
|
173
|
+
Wukong::Geolocated.quadkey_to_tile_xy_zl(aus_quadkey[0..10]).should == [aus_tile_x_11.floor, aus_tile_y_11.floor, 11]
|
174
|
+
end
|
175
|
+
|
176
|
+
it "allows '' to be a quadkey (whole map)" do
|
177
|
+
Wukong::Geolocated.quadkey_to_tile_xy_zl("").should == [0, 0, 0]
|
178
|
+
end
|
179
|
+
|
180
|
+
it "maps tile xyz [0,0,0] to quadkey ''" do
|
181
|
+
Wukong::Geolocated.tile_xy_zl_to_quadkey(0,0,0).should == ""
|
182
|
+
end
|
183
|
+
|
184
|
+
it "throws an error if a bad quadkey is given" do
|
185
|
+
expect{ Wukong::Geolocated.quadkey_to_tile_xy_zl("bad_key") }.to raise_error(ArgumentError, /Quadkey.*characters/)
|
186
|
+
end
|
187
|
+
|
188
|
+
it "returns a bounding box given a quadkey" do
|
189
|
+
left, btm, right, top = Wukong::Geolocated.quadkey_to_bbox(aus_quadkey_3)
|
190
|
+
left.should be_within(0.0001).of(-135.0)
|
191
|
+
right.should be_within(0.0001).of(- 90.0)
|
192
|
+
btm.should be_within(0.0001).of( 0.0)
|
193
|
+
top.should be_within(0.0001).of( 40.9799)
|
194
|
+
end
|
195
|
+
|
196
|
+
it "returns the smallest quadkey containing two points" do
|
197
|
+
Wukong::Geolocated.quadkey_containing_bbox(aus_lng, aus_lat, sat_lng, sat_lat).should == "023130"
|
198
|
+
end
|
199
|
+
|
200
|
+
it "returns a bounding box given a point and radius" do
|
201
|
+
left, btm, right, top = Wukong::Geolocated.lng_lat_rad_to_bbox(aus_lng, aus_lat, radius)
|
202
|
+
|
203
|
+
left.should be_within(0.0001).of(-108.1723)
|
204
|
+
right.should be_within(0.0001).of(- 87.3457)
|
205
|
+
btm.should be_within(0.0001).of( 21.2807)
|
206
|
+
top.should be_within(0.0001).of( 39.2671)
|
207
|
+
end
|
208
|
+
|
209
|
+
it "returns a centroid given a bounding box" do
|
210
|
+
mid_lng, mid_lat = Wukong::Geolocated.bbox_centroid([aus_lng, sat_lat], [sat_lng, aus_lat])
|
211
|
+
mid_lng.should be_within(0.0001).of(-98.1241)
|
212
|
+
mid_lat.should be_within(0.0001).of( 29.8503)
|
213
|
+
end
|
214
|
+
|
215
|
+
it "returns a pixel resolution given a latitude and zoom level" do
|
216
|
+
Wukong::Geolocated.pixel_resolution(aus_lat, 3).should be_within(0.0001).of(16880.4081)
|
217
|
+
end
|
218
|
+
|
219
|
+
it "returns a map scale given a latitude, zoom level and dpi" do
|
220
|
+
Wukong::Geolocated.map_scale_for_dpi(aus_lat, 3, dpi).should be_within(0.0001).of(47849975.8302)
|
221
|
+
end
|
222
|
+
|
223
|
+
it "calculates the haversine distance between two points" do
|
224
|
+
Wukong::Geolocated.haversine_distance(aus_lng, aus_lat, sat_lng, sat_lat).should be_within(0.0001).of(117522.1219)
|
225
|
+
end
|
226
|
+
|
227
|
+
it "calculates the haversine midpoint between two points" do
|
228
|
+
lng, lat = Wukong::Geolocated.haversine_midpoint(aus_lng, sat_lat, sat_lng, aus_lat)
|
229
|
+
lng.should be_within(0.0001).of(-98.1241)
|
230
|
+
lat.should be_within(0.0001).of( 29.8503)
|
231
|
+
end
|
232
|
+
|
233
|
+
it "calculates the point a given distance directly north from a lat/lng" do
|
234
|
+
lng, lat = Wukong::Geolocated.point_north(aus_lng, aus_lat, 1000000)
|
235
|
+
lng.should be_within(0.0001).of(-97.7590)
|
236
|
+
lat.should be_within(0.0001).of( 39.2671)
|
237
|
+
end
|
238
|
+
|
239
|
+
it "calculates the point a given distance directly east from a lat/lng" do
|
240
|
+
lng, lat = Wukong::Geolocated.point_east(aus_lng, aus_lat, 1000000)
|
241
|
+
lng.should be_within(0.0001).of(-87.3457)
|
242
|
+
lat.should be_within(0.0001).of( 30.2739)
|
243
|
+
end
|
244
|
+
|
245
|
+
|
246
|
+
end # module methods
|
247
|
+
end
|
@@ -0,0 +1,77 @@
|
|
1
|
+
#!/usr/bin/env ruby
|
2
|
+
|
3
|
+
require 'faraday'
|
4
|
+
require 'gorillib/pathname/utils'
|
5
|
+
#
|
6
|
+
require_relative '../rake_helper'
|
7
|
+
require_relative '../geo'
|
8
|
+
|
9
|
+
Pathname.register_paths(
|
10
|
+
images: [:root, 'images', 'map_grid_cells'],
|
11
|
+
)
|
12
|
+
|
13
|
+
Settings.use :commandline
|
14
|
+
Settings.define :server, default: 'http://a.tile.stamen.com/toner-lite', description: "Map tile server; anything X/Y/Z.png-addressable works, eg http://b.tile.openstreetmap.org"
|
15
|
+
Settings.define :clobber, default: true, type: :boolean, description: "true to overwrite files (the default)"
|
16
|
+
|
17
|
+
Settings.define :slug, default: 'tile', description: "A name to prefix on the file"
|
18
|
+
|
19
|
+
Settings.define :zl, description: "Zoom level of tile to fetch. An integer between 0 (world) and 16 or so"
|
20
|
+
Settings.define :tile_x, type: Integer, description: "Tile X index, an integer between 0 and 2^zoom_level - 1"
|
21
|
+
Settings.define :tile_y, type: Integer, description: "Tile Y index, an integer between 0 and 2^zoom_level - 1"
|
22
|
+
Settings.define :longitude, type: Float, description: "Longitude (X) of a point on the tile in decimal degrees"
|
23
|
+
Settings.define :latitude, type: Float, description: "Latitude (Y) of a point on the tile in decimal degrees"
|
24
|
+
Settings.define :quadkey, description: "Quadkey of tile, eg 002313012."
|
25
|
+
|
26
|
+
Settings.resolve!
|
27
|
+
|
28
|
+
def fetch_tile(tile_info)
|
29
|
+
tile = Wukong::Geo::Quadtile.from_whatever(tile_info)
|
30
|
+
|
31
|
+
Pathname.of(:images, tile.basename(Settings.slug)).if_missing(force: Settings.clobber) do |output_file|
|
32
|
+
Log.info "Writing to file #{output_file.path} from #{tile.tile_url}"
|
33
|
+
output_file << tile.fetch.body
|
34
|
+
end
|
35
|
+
end
|
36
|
+
|
37
|
+
tile_info = Settings.to_hash
|
38
|
+
|
39
|
+
MAX_TILES_TO_FETCH = 1e4 unless defined?(MAX_TILES_TO_FETCH)
|
40
|
+
# MAX_TILES_TO_FETCH = 18 unless defined?(MAX_TILES_TO_FETCH)
|
41
|
+
|
42
|
+
def quadkey_range(quadkey, zl, zl_max, &block)
|
43
|
+
return if zl > zl_max
|
44
|
+
p [quadkey, zl, zl_max]
|
45
|
+
#
|
46
|
+
if quadkey.length >= zl
|
47
|
+
yield quadkey[0 .. zl]
|
48
|
+
quadkey_range(quadkey, zl+1, zl_max, &block)
|
49
|
+
else
|
50
|
+
n_tiles = 4 ** (zl_max - quadkey.length)
|
51
|
+
if (n_tiles > MAX_TILES_TO_FETCH) then raise "Too many sub-tiles: #{quadkey} at zl #{zl}..#{zl_max} would create #{n_tiles} tiles; limit is #{MAX_TILES_TO_FETCH}" ; end
|
52
|
+
#
|
53
|
+
(0..3).each do |quad|
|
54
|
+
quadkey_range("#{quadkey}#{quad}", zl, zl_max, &block)
|
55
|
+
end
|
56
|
+
end
|
57
|
+
end
|
58
|
+
|
59
|
+
# Guess the zoom level from quadkey if missing
|
60
|
+
Settings.zl ||= Settings.quadkey.length.to_s if Settings.quadkey.present?
|
61
|
+
# and then extract the range if any
|
62
|
+
zl_min, zl_max = Settings.zl.split('-', 2)
|
63
|
+
zl_min = zl_min.to_i
|
64
|
+
zl_max = zl_max ? zl_max.to_i : zl_min
|
65
|
+
|
66
|
+
if Settings.quadkey.present?
|
67
|
+
Settings.quadkey.gsub!(/_/, '')
|
68
|
+
|
69
|
+
quadkey_range(Settings.quadkey, zl_min, zl_max) do |quadkey|
|
70
|
+
fetch_tile(tile_info.merge(quadkey: quadkey, zl: quadkey.length))
|
71
|
+
end
|
72
|
+
|
73
|
+
else
|
74
|
+
(zl_min.to_i .. zl_max.to_i).each do |zl|
|
75
|
+
fetch_tile(tile_info.merge(zl: zl))
|
76
|
+
end
|
77
|
+
end
|
@@ -0,0 +1,63 @@
|
|
1
|
+
# Implied Geolocation
|
2
|
+
|
3
|
+
* Some objects are explicitly geolocated: "Austin, Texas", "Cornell University", the "USS_Constitution".
|
4
|
+
* Some objects are not only geolocated, they are 'places' -- present as well in the geonames dataset.
|
5
|
+
|
6
|
+
The estimator is as follows:
|
7
|
+
|
8
|
+
* a best-estimate longitude and latitude
|
9
|
+
* the radius of uncertainty for the point
|
10
|
+
* the likelihood the point is erroneous
|
11
|
+
|
12
|
+
12000 krec articles
|
13
|
+
7000 krec geonames
|
14
|
+
400 krec dbpedia-geo_coordinates_en.json
|
15
|
+
87 krec dbpedia-geonames_links.json
|
16
|
+
|
17
|
+
|
18
|
+
|
19
|
+
### dispatch geolocation estimates along links
|
20
|
+
|
21
|
+
* Send every neighbor your geoestimate
|
22
|
+
|
23
|
+
accumulate all neighbors' geoestimates.
|
24
|
+
|
25
|
+
|
26
|
+
In this drawing, the vertical bars show implied locations; six reasonably nearby each other and two with large error.
|
27
|
+
|
28
|
+
| | | | || | |
|
29
|
+
----+------+-+-------+--++------- // ----+---- // --+-----
|
30
|
+
|
31
|
+
But of course in some places I _know_ the location
|
32
|
+
|
33
|
+
| X | | | || | |
|
34
|
+
----+----X-+-+-------+--++------- // ----+---- // --+-----
|
35
|
+
X
|
36
|
+
`-- actual location
|
37
|
+
|
38
|
+
|
39
|
+
Why are the estimates spread from the actual?
|
40
|
+
|
41
|
+
* intrinsic size of the actual: the graph neighbors of "Texas" are spread over a much larger area than the graph neighbors of "Yee-Haw Junction, FL".
|
42
|
+
* strength of the relationship: for example, this naive model can't tell the difference between "X is located in Y" and "X borders Y"
|
43
|
+
* errors in the relationship: the link might be irrelevant or not explanatory for any reason -- anything from "X has the same area as Virginia" to a hacked page.
|
44
|
+
* multi-modal location: Davey Crockett (TODO: verify) was from XXX to XXX the representative of Tennesee (location #1) to the US Congress in Washington, DC (locaton #2). Upon losing re-election, he famously said "You can all go to hell, I am going to Texas"; he died during the battle of the Alamo. The most robust assignment of a geolocation to "Davey Crockett" would look something like the following cartoon:
|
45
|
+
|
46
|
+
____
|
47
|
+
/ \ ------
|
48
|
+
/ \ / \ +-+
|
49
|
+
| |_____| |____/ \
|
50
|
+
|
51
|
+
Tennesee Texas DC
|
52
|
+
|
53
|
+
|
54
|
+
So what we're going to do is track two separate types of error:
|
55
|
+
|
56
|
+
* the likelihood the estimate is drawn from purely irrelevant points
|
57
|
+
* assuming the estimates are relevant, the fuzziness of the implied geolocation.
|
58
|
+
|
59
|
+
|
60
|
+
|
61
|
+
* ?? only use estimates with some strength ??
|
62
|
+
* For all known points, the number of neighbors that are irrelevant
|
63
|
+
|