gis_scraper 0.1.2.pre → 0.1.3.pre

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 707ed88a572b008ddf44c0b67dd6fbd23cf78b3e
4
- data.tar.gz: c5e8da833c1459487d31f81ccf0421676fe2cdf1
3
+ metadata.gz: d095719f299da91d96069b76373e675a37a81842
4
+ data.tar.gz: 4d35174846e8bb601151e5a862e4ad39f40ee7e0
5
5
  SHA512:
6
- metadata.gz: aa9e8ad923993aad591c43fc2d3bdf50e9ba9500504ebff4947781a0a9641d2cc5254ca6f701aa2961c84665685196680fb67afeb5c97fcf67eda736dc1303ed
7
- data.tar.gz: 51dbfe5cf8d3dbf72dc8f3ca772514857cef062f585a91067613f7cb1f59d848c6a5f338c5de8acdf70d1b6dd91247910377b56bdf016c628fa767e344267f8e
6
+ metadata.gz: 69eb9967e11ff58f9de54feb8032c6097a6e7855e817acf62451b08c3df378c1c7a7c7e579f790c35fb0dacf4a1db82d13eb6d3b1c7221a3b1cdb013be705e34
7
+ data.tar.gz: b114a1f53b9fa97a0960e819f2ac26ae010dad9d417782451687f72f7884a4f950e0562f76ac27de09108809cda00d5c83cbe05d272d980fd393a09317869776
data/.gitignore CHANGED
@@ -1,3 +1,4 @@
1
1
  **/.DS_Store
2
2
  /Gemfile.lock
3
3
  /pkg/
4
+ /tmp/
data/.travis.yml CHANGED
@@ -1,5 +1,22 @@
1
1
  language: ruby
2
2
 
3
+ addons:
4
+ postgresql: "9.4"
5
+
6
+ services:
7
+ - postgresql
8
+
9
+ before_script:
10
+ - psql -c 'create database travis_ci_test;' -U postgres
11
+ - psql -U postgres -c 'create extension postgis;'
12
+
13
+ before_install:
14
+ - gem update bundler
15
+ # http://askubuntu.com/questions/206593/how-to-install-rgdal-on-ubuntu-12-10
16
+ - sudo apt-get update -qq
17
+ - sudo apt-get install -y aptitude
18
+ - sudo aptitude install -y libgdal-dev libproj-dev
19
+
3
20
  rvm:
4
21
  - 2.0.0
5
22
  - 2.1.6
data/README.md CHANGED
@@ -2,21 +2,117 @@
2
2
  [![Gem Version](https://badge.fury.io/rb/gis_scraper.svg)](http://badge.fury.io/rb/gis_scraper)
3
3
  [![Build status](https://secure.travis-ci.org/MatzFan/gis_scraper.svg)](http://travis-ci.org/MatzFan/gis_scraper)
4
4
 
5
- Utility to recursively scrape ArcGIS MapServer data using REST API.
5
+ Utility to recursively scrape ArcGIS MapServer data using the ArcGIS REST API.
6
6
 
7
- ArcGIS MapServer REST queries are limited to 1,000 objects in some cases. This tool makes repeated calls until all data for a given layer is extracted. It then merges the resulting JSON files into a single file. This allows GIS clients like QGIS to add a layer from a single file.
7
+ ArcGIS MapServer REST queries are limited to 1,000 objects in some cases. This tool makes repeated calls until all data for a given layer (and all sub-layers) is extracted. Output can be JSON file format or data may be written directly to Postgres database tables in PostGIS format. GIS clients - e.g. QGIS - can be configured to use vector layer data from PostGIS sources.
8
8
 
9
- **Usage**
9
+ ## Requirements
10
10
 
11
- The executable is called 'gisget' and takes one required arg - a MapServer/Layer URL (ending in an integer representing the layer number). An optional file output path may also be specified. If omitted the file will be saved in current directory. Example:
11
+ Ruby 2.0 or above - see Travis badge for tested Ruby versions.
12
12
 
13
+ A Postgres database with the PostGIS extension enabled for database export.
14
+
15
+ For data import to a database [GDAL](http://gdal.org) must be installed and specifically the [ogr2ogr](http://www.gdal.org/ogr2ogr.html) executable must be available in your path.
16
+
17
+ ## Known Limitations
18
+
19
+ *NIX systems only - Linux/Mac OS X/Linux. ArcGIS MapServer data is readable directly by ArcGIS Windows clients.
20
+
21
+ The following esri geometry types are supported:
22
+
23
+ - esriGeometryPoint, esriGeometryMultipoint, esriGeometryLine, esriGeometryPolyline, esriGeometryPolygon
24
+
25
+ ## Installation
26
+
27
+ Add this line to your application's Gemfile:
28
+
29
+ ```ruby
30
+ gem 'gis_scraper'
31
+ ```
32
+
33
+ And then execute:
34
+
35
+ $ bundle
36
+
37
+ Or install it yourself as:
38
+
39
+ $ gem install gis_scraper
40
+
41
+ ## Configuration
42
+
43
+ Configuration options may be set via a hash or specified in a Yaml file. The following options are available:
44
+
45
+ - ```:threads``` Scraping is multi-threaded. The number of threads to use may be set with this option (default: 8)
46
+ - ```:output_path``` For JSON output, the path used to write files to (default: '~/Desktop')
47
+
48
+ The following options are used to connect to a database:
49
+
50
+ - ```:host``` (default: 'localhost')
51
+ - ```:port``` (default: 5432)
52
+ - ```:dbname``` (default: 'postgres')
53
+ - ```:user``` (default: 'postgres')
54
+ - ```:password``` (default: nil)
55
+
56
+ These additional options are available when using output to a database and are applied to the ```ogr2ogr``` command:
57
+
58
+ - ```:srs``` Used to overide the source spacial reference system. Currently only EPSG string format is valid - e.g. 'EPSG:3109' (default: no overide)
59
+
60
+ **To set via a hash**
61
+
62
+ ```Ruby
63
+ GisScraper.configure(:threads => 16)
64
+ ```
65
+
66
+ **Using a Yaml configuration file**
67
+
68
+ ```Ruby
69
+ GisScraper.configure_with 'path-to-Yaml-file'
70
+ ```
71
+
72
+ ```Ruby
73
+ GisScraper.config # returns the hash of configuration values
74
+ ```
75
+
76
+ ## Usage
77
+
78
+ A Layer object must be instantiated with one required arg - a MapServer/Layer URL (ending in an integer representing the layer number). Example:
79
+
80
+ ```
81
+ layer = Layer.new('http://gps.digimap.gg/arcgis/rest/services/StatesOfJersey/JerseyMappingOL/MapServer/0')
82
+ ```
83
+
84
+ An optional second argument for the output path for JSON files may be specified. If so this overides the configuration option. Example:
85
+
86
+ ```
87
+ layer = Layer.new('http://gps.digimap.gg/arcgis/rest/services/StatesOfJersey/JerseyMappingOL/MapServer/0', '~/Desktop')
13
88
  ```
14
- gisget http://gps.digimap.gg/arcgis/rest/services/StatesOfJersey/JerseyMappingOL/MapServer/0 ~/Desktop
89
+
90
+ **JSON output**
91
+
92
+ ```
93
+ layer.output_json
15
94
  ```
16
95
 
17
96
  If the layer is type 'Feature Layer', a single file of JSON data will be saved (named the same as the layer). If the layer is type 'Group Layer', the sub-group structure is traversed recursively thus: Directories for each sub-group layer are created and JSON data files for each constituent feature layer written to them.
18
97
 
19
- **Specification and Tests**
98
+ **Output to a database**
99
+
100
+ Valid database config options must be set. The following command will convert JSON files, create tables for each layer (& sub-layers, if any) and import the data.
101
+
102
+ ```
103
+ layer.output_to_db
104
+ ```
105
+
106
+ ## Specification and Tests
107
+
108
+ For the full specification clone this repo and run:
109
+
110
+ `rake spec`
111
+
112
+ ## Contributing
113
+
114
+ Bug reports, pull requests (and feature requests) are welcome on GitHub at https://github.com/MatzFan/gis_scraper.
20
115
 
21
- rspec spec
116
+ ## License
22
117
 
118
+ The gem is available as open source under the terms of the [MIT License](http://opensource.org/licenses)
data/gis_scraper.gemspec CHANGED
@@ -12,6 +12,7 @@ Gem::Specification.new do |s|
12
12
 
13
13
  s.summary = %q{Scrapes ArcGIS data from MapServer REST API}
14
14
  s.description = %q{Scrapes ArcGIS data from MapServer REST API}
15
+ s.required_ruby_version = '>= 2.0'
15
16
  s.license = "MIT"
16
17
 
17
18
  s.files = `git ls-files -z`.split("\x0").reject { |f| f.match(%r{^(spec)/}) }
@@ -25,4 +26,5 @@ Gem::Specification.new do |s|
25
26
 
26
27
  s.add_runtime_dependency 'mechanize', '~> 2.7'
27
28
  s.add_runtime_dependency 'parallel', '~> 1.6'
29
+ s.add_development_dependency 'pg', '~> 0.18'
28
30
  end
@@ -12,6 +12,8 @@ class Layer
12
12
  end
13
13
 
14
14
  class UnknownLayerType < StandardError; end
15
+ class NoDatabase < StandardError; end
16
+ class OgrMissing < StandardError; end
15
17
 
16
18
  attr_reader :type, :id, :name
17
19
 
@@ -21,8 +23,21 @@ class Layer
21
23
  'Annotation SubLayer']
22
24
  QUERYABLE = ['Feature Layer', 'Annotation Layer']
23
25
 
24
- def initialize(url, path = '.')
25
- @url, @path = url, File.expand_path(path)
26
+ CONN = [:host, :port, :dbname, :user, :password] # PG connection options
27
+
28
+ GEOM_TYPES = {esriGeometryPoint: 'POINT',
29
+ esriGeometryMultipoint: 'MULTIPOINT',
30
+ esriGeometryLine: 'LINESTRING',
31
+ esriGeometryPolyline: 'MULTILINESTRING',
32
+ esriGeometryPolygon: 'MULTIPOLYGON'}
33
+
34
+
35
+ OGR2OGR = 'ogr2ogr -f "PostgreSQL" PG:'
36
+
37
+ def initialize(url, output_path = nil)
38
+ @conn_hash = CONN.zip(CONN.map { |key| GisScraper.config[key] }).to_h
39
+ @url = url
40
+ @output_path = output_path || config_path
26
41
  @ms_url = ms_url # map server url ending '../MapServer'
27
42
  @id = id
28
43
  @agent = Mechanize.new
@@ -33,12 +48,32 @@ class Layer
33
48
  @name = name
34
49
  end
35
50
 
36
- def write
51
+ def output_json
37
52
  QUERYABLE.any? { |l| @type == l } ? write_json_files : process_sub_layers
38
53
  end
39
54
 
55
+ def output_to_db
56
+ raise OgrMissing.new, 'ogr2ogr missing, is GDAL installed?' if !ogr2ogr?
57
+ raise NoDatabase.new, "No db connection: #{@conn_hash.inspect}" if !db?
58
+ @output_path = 'tmp' # write all files to the Gem's tmp dir
59
+ output_json
60
+ write_json_files_to_db_tables
61
+ end
62
+
40
63
  private
41
64
 
65
+ def db?
66
+ PG.connect(@conn_hash) rescue nil
67
+ end
68
+
69
+ def ogr2ogr?
70
+ `ogr2ogr --version` rescue nil
71
+ end
72
+
73
+ def config_path
74
+ File.expand_path GisScraper.config[:output_path]
75
+ end
76
+
42
77
  def ms_url
43
78
  @url.split('/')[0..-2].join('/')
44
79
  end
@@ -78,20 +113,53 @@ class Layer
78
113
  end
79
114
 
80
115
  def write_json_files
81
- File.write "#{@path}/#{@name}.json", json_data("#{@ms_url}/#{@id}")
116
+ File.write "#{@output_path}/#{@name}.json", json_data("#{@ms_url}/#{@id}")
117
+ end
118
+
119
+ def write_json_files_to_db_tables
120
+ files.each do |f|
121
+ `#{OGR2OGR}"#{conn}" "#{f}" -nln #{base(f)} #{srs} -nlt #{geom(f)}`
122
+ end
123
+ end
124
+
125
+ def geom(file)
126
+ esri = esri_geom(file)
127
+ GEOM_TYPES[esri.to_sym] || raise("Unknown geometry type: '#{esri}'")
128
+ end
129
+
130
+ def esri_geom(file)
131
+ JSON.parse(File.read(file))['geometryType']
132
+ end
133
+
134
+ def srs
135
+ return '' unless GisScraper.config[:srs]
136
+ "-a_srs #{GisScraper.config[:srs]}" || ''
137
+ end
138
+
139
+ def base(full_file_name)
140
+ full_file_name.split('/').last[0..-6].downcase
141
+ end
142
+
143
+ def files
144
+ Dir.glob('tmp/**/*.json')
145
+ end
146
+
147
+ def conn
148
+ host, port, db, user, pwd = *@conn_hash.values
149
+ "host=#{host} port=#{port} dbname=#{db} user=#{user} password=#{pwd}"
82
150
  end
83
151
 
84
152
  def process_sub_layers
85
153
  sub_layer_id_names.each do |hash|
86
154
  name, id = hash['name'], hash['id']
87
- path = "#{@path}/#{name}"
88
- recurse sub_layer(id, path), path
155
+ path = "#{@output_path}/#{name}"
156
+ recurse_json sub_layer(id, path), path
89
157
  end
90
158
  end
91
159
 
92
- def recurse(layer, dir)
160
+ def recurse_json(layer, dir)
93
161
  FileUtils.mkdir dir
94
- layer.write
162
+ layer.output_json
95
163
  end
96
164
 
97
165
  def sub_layer(id, path)
@@ -1,3 +1,3 @@
1
1
  module GisScraper
2
- VERSION = '0.1.2.pre'
2
+ VERSION = '0.1.3.pre'
3
3
  end
data/lib/gis_scraper.rb CHANGED
@@ -1,6 +1,7 @@
1
1
  require 'yaml'
2
2
  require 'mechanize'
3
3
  require 'parallel'
4
+ require 'pg'
4
5
 
5
6
  require 'gis_scraper/version'
6
7
  require 'gis_scraper/feature_scraper'
@@ -9,7 +10,9 @@ require 'gis_scraper/layer'
9
10
  # stackoverflow.com/questions/6233124/where-to-place-access-config-file-in-gem
10
11
  module GisScraper
11
12
 
12
- @config = {threads: 8} # threads used for scraping
13
+ @config = {threads: 8, output_path: '~/Desktop',
14
+ host: 'localhost', port: 5432, dbname: 'postgres', user: 'postgres', password: nil,
15
+ srs: nil}
13
16
  @valid_keys = @config.keys
14
17
 
15
18
  def self.configure(opts = {})
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: gis_scraper
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.2.pre
4
+ version: 0.1.3.pre
5
5
  platform: ruby
6
6
  authors:
7
7
  - Bruce Steedman
8
8
  autorequire:
9
9
  bindir: exe
10
10
  cert_chain: []
11
- date: 2015-12-27 00:00:00.000000000 Z
11
+ date: 2015-12-29 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: bundler
@@ -80,11 +80,24 @@ dependencies:
80
80
  - - "~>"
81
81
  - !ruby/object:Gem::Version
82
82
  version: '1.6'
83
+ - !ruby/object:Gem::Dependency
84
+ name: pg
85
+ requirement: !ruby/object:Gem::Requirement
86
+ requirements:
87
+ - - "~>"
88
+ - !ruby/object:Gem::Version
89
+ version: '0.18'
90
+ type: :development
91
+ prerelease: false
92
+ version_requirements: !ruby/object:Gem::Requirement
93
+ requirements:
94
+ - - "~>"
95
+ - !ruby/object:Gem::Version
96
+ version: '0.18'
83
97
  description: Scrapes ArcGIS data from MapServer REST API
84
98
  email:
85
99
  - bruce.steedman@gmail.com
86
- executables:
87
- - gisget
100
+ executables: []
88
101
  extensions: []
89
102
  extra_rdoc_files: []
90
103
  files:
@@ -97,7 +110,6 @@ files:
97
110
  - Rakefile
98
111
  - bin/console
99
112
  - bin/setup
100
- - exe/gisget
101
113
  - gis_scraper.gemspec
102
114
  - lib/gis_scraper.rb
103
115
  - lib/gis_scraper/feature_scraper.rb
@@ -115,7 +127,7 @@ required_ruby_version: !ruby/object:Gem::Requirement
115
127
  requirements:
116
128
  - - ">="
117
129
  - !ruby/object:Gem::Version
118
- version: '0'
130
+ version: '2.0'
119
131
  required_rubygems_version: !ruby/object:Gem::Requirement
120
132
  requirements:
121
133
  - - ">"
data/exe/gisget DELETED
@@ -1,7 +0,0 @@
1
- #!/usr/bin/env ruby
2
-
3
- require 'gis_scraper'
4
-
5
- start = Time.now
6
- Layer.new(*ARGV).write
7
- puts "Finished in #{Time.now - start} seconds"