spatial_stats 0.1.1 → 0.2.1

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 56905c84e53f041ed8acc1437d7d3ee1312ff9515f3e11e08bd31120e7ed0c26
4
- data.tar.gz: 2879c58f356b430285d55717cba5fc27f239097c35067032bd9d4407688c5086
3
+ metadata.gz: 8ec49f792086ad98d8c07da5dde57041085df82d491b168bf66aebac1fab9cb4
4
+ data.tar.gz: 134adbb6d387487c87b1fee3c4529e56c1228dd4dd44d01f2822dadb1aaa9955
5
5
  SHA512:
6
- metadata.gz: 7fd244b8cc798e4806d6c380bd3e1e261da3f8f509ed87f4f920f9eb706cb8d94150e1a915cc3f73417196ae8d91fc29579d6604f05c38cbd4c1a82b6cb34e48
7
- data.tar.gz: 205d0850f885b0e0d673d51976d0d69d7357bd2a2f0382aaa0140f096fd5164db1cce7f1fc0c31cb204b97153f75cb1a18ff9d4120c905d2f73c4a9b247659cb
6
+ metadata.gz: e02e5a207e8dae161aac12fd55f7325634a15dbd2485b521125e7ae41276b022eb7fca9ecfca051d5aded322d07d715bc1788e83940a2c4bab84a4bd5ae6e022
7
+ data.tar.gz: 3932e7019b3989c85476632aa740d3f48ce7b7467be2a78daefb45d4038c6ba7a21cd1d3d6704331b6021ea22e5314faa1bdeee24c91d50bbd90a7c5aa5edeba
data/README.md CHANGED
@@ -1,6 +1,8 @@
1
+ [![Build Status](https://travis-ci.com/keithdoggett/spatial_stats.svg?branch=master)](https://travis-ci.com/keithdoggett/spatial_stats)
2
+
1
3
  # SpatialStats
2
4
 
3
- Short description and motivation.
5
+ SpatialStats is an ActiveRecord plugin that utilizes PostGIS and Ruby to compute weights/statistics of spatial data sets in Rails Apps.
4
6
 
5
7
  ## Installation
6
8
 
@@ -24,13 +26,178 @@ $ gem install spatial_stats
24
26
 
25
27
  ## Usage
26
28
 
27
- How to use my plugin.
29
+ ### Weights
30
+
31
+ Weights define the spatial relation between members of a dataset. Contiguous operations are supported for `polygons` and `multipolygons`, and distant operations are supported for `points`.
32
+
33
+ To compute weights, you need an `ActiveRecord::Relation` scope and a geometry field. From there, you can pick what type of weight operation to compute (`knn`, `queen neighbors`, etc.).
34
+
35
+ #### Compute Queen Weights
36
+
37
+ ```ruby
38
+ # County table has the following fields: avg_income: float, geom: multipolygon.
39
+ scope = County.all
40
+ geom_field = :geom
41
+ weights = SpatialStats::Weights::Contiguous.queen(scope, geom_field)
42
+ # => #<SpatialStats::Weights::WeightsMatrix>
43
+ ```
44
+
45
+ #### Compute KNN of Centroids
46
+
47
+ The field being queried does not have to be defined in the schema, but could be computed during the query for scope.
48
+
49
+ This example finds the inverse distance weighted, 5 nearest neighbors for the centroid of each county.
50
+
51
+ ```ruby
52
+ scope = County.all.select("*, st_centroid(geom) as geom")
53
+ weights = SpatialStats::Weights::Distant.idw_knn(scope, :geom, 5)
54
+ # => #<SpatialStats::Weights::WeightsMatrix>
55
+ ```
56
+
57
+ #### Define WeightsMatrix without Query
58
+
59
+ Weight matrices can be defined by a hash that describes each key's neighbor and weight.
60
+
61
+ Note: Currently, the keys must be numeric.
62
+
63
+ Example: Define WeightsMatrix and get the matrix in row_standardized format.
64
+
65
+ ```ruby
66
+ weights = {
67
+ 1 => [{ j_id: 2, weight: 1 }, { j_id: 4, weight: 1 }],
68
+ 2 => [{ j_id: 1, weight: 1 }],
69
+ 3 => [{ j_id: 4, weight: 1 }],
70
+ 4 => [{ j_id: 1, weight: 1 }, { j_id: 3, weight: 1 }]
71
+ }
72
+ keys = weights.keys
73
+ wm = SpatialStats::Weights::WeightsMatrix.new(keys, weights)
74
+ # => #<SpatialStats::Weights::WeightsMatrix:0x0000561e205677c0 @keys=[1, 2, 3, 4], @weights={1=>[{:j_id=>2, :weight=>1}, {:j_id=>4, :weight=>1}], 2=>[{:j_id=>1, :weight=>1}], 3=>[{:j_id=>4, :weight=>1}], 4=>[{:j_id=>1, :weight=>1}, {:j_id=>3, :weight=>1}]}, @n=4>
75
+
76
+ wm.standardized
77
+ # => Numo::DFloat#shape=[4,4]
78
+ #[[0, 0.5, 0, 0.5],
79
+ # [1, 0, 0, 0],
80
+ # [0, 0, 0, 1],
81
+ # [0.5, 0, 0.5, 0]]
82
+ ```
83
+
84
+ ### Lagged Variables
85
+
86
+ Spatially lagged variables can be computed with a 2-D n x n `Numo::NArray` and 1-D vector (`Array` or `Numo::NArray`).
87
+
88
+ #### Compute a Lagged Variable
89
+
90
+ ```ruby
91
+ w = Numo::DFloat[[0, 0.5, 0, 0.5],
92
+ [1, 0, 0, 0],
93
+ [0, 0, 0, 1],
94
+ [0.5, 0, 0.5, 0]]
95
+ vec = [1, 2, 3, 4]
96
+ lagged_var = SpatialStats::Utils::Lag.neighbor_sum(w, vec)
97
+ # => [3.0, 1.0, 4.0, 2.0]
98
+ ```
99
+
100
+ ### Global Stats
101
+
102
+ Global stats compute a value for the dataset, like how clustered the observations are within the region.
103
+
104
+ Most `stat` classes take three parameters: `scope`, `data_field`, and `weights`. All `stat` classes have the `stat` method that will compute the target statistic. These are also aliased with the common name of the statistic, such as `i` for `Moran` or `c` for `Geary`.
105
+
106
+ #### Compute Moran's I
107
+
108
+ ```ruby
109
+ scope = County.all
110
+ weights = SpatialStats::Weights::Contiguous.rook(scope, :geom)
111
+ moran = SpatialStats::Global::Moran.new(scope, :avg_income, weights)
112
+ # => <SpatialStats::Global::Moran>
113
+
114
+ moran.stat
115
+ # => 0.834
116
+
117
+ moran.i
118
+ # => 0.834
119
+ ```
120
+
121
+ #### Compute Moran's I Z-Score
122
+
123
+ ```ruby
124
+ scope = County.all
125
+ weights = SpatialStats::Weights::Contiguous.rook(scope, :geom)
126
+ moran = SpatialStats::Global::Moran.new(scope, :avg_income, weights)
127
+ # => <SpatialStats::Global::Moran>
128
+
129
+ moran.z_score
130
+ # => 3.2
131
+ ```
132
+
133
+ #### Run a Permutation Test on Moran's I
134
+
135
+ All stat classes have the `mc` method which takes `permutations` and `seed` as its parameters. `mc` runs a permutation test on the class and returns the psuedo p-value.
136
+
137
+ ```ruby
138
+ scope = County.all
139
+ weights = SpatialStats::Weights::Contiguous.rook(scope, :geom)
140
+ moran = SpatialStats::Global::Moran.new(scope, :avg_income, weights)
141
+ # => <SpatialStats::Global::Moran>
142
+
143
+ moran.mc(999, 123_456)
144
+ # => 0.003
145
+ ```
146
+
147
+ ### Local Stats
148
+
149
+ Local stats compute a value each observation in the dataset, like how similar its neighbors are to itself. Local stats operate similarly to global stats, except that almost every operation will return an array of length `n` where `n` is the number of observations in the dataset.
150
+
151
+ Most `stat` classes take three parameters: `scope`, `data_field`, and `weights`. All `stat` classes have the `stat` method that will compute the target statistic. These are also aliased with the common name of the statistic, such as `i` for `Moran` or `c` for `Geary`.
152
+
153
+ #### Compute Moran's I
154
+
155
+ ```ruby
156
+ scope = County.all
157
+ weights = SpatialStats::Weights::Contiguous.rook(scope, :geom)
158
+ moran = SpatialStats::Local::Moran.new(scope, :avg_income, weights)
159
+ # => <SpatialStats::Local::Moran>
160
+
161
+ moran.stat
162
+ # => [0.888, 0.675, 0.2345, -0.987, -0.42, ...]
163
+
164
+ moran.i
165
+ # => [0.888, 0.675, 0.2345, -0.987, -0.42, ...]
166
+ ```
167
+
168
+ #### Compute Moran's I Z-Scores
169
+
170
+ Note: Many classes do not have a variance or expectation method implemented and this will raise a `NotImplementedError`.
171
+
172
+ ```ruby
173
+ scope = County.all
174
+ weights = SpatialStats::Weights::Contiguous.rook(scope, :geom)
175
+ moran = SpatialStats::Local::Moran.new(scope, :avg_income, weights)
176
+ # => <SpatialStats::Local::Moran>
177
+
178
+ moran.z_score
179
+ # => # => [0.65, 1.23, 0.42, 3.45, -0.34, ...]
180
+ ```
181
+
182
+ #### Run a Permutation Test on Moran's I
183
+
184
+ All stat classes have the `mc` method which takes `permutations` and `seed` as its parameters. `mc` runs a permutation test on the class and returns the psuedo p-values.
185
+
186
+ ```ruby
187
+ scope = County.all
188
+ weights = SpatialStats::Weights::Contiguous.rook(scope, :geom)
189
+ moran = SpatialStats::Local::Moran.new(scope, :avg_income, weights)
190
+ # => <SpatialStats::Local::Moran>
191
+
192
+ moran.mc(999, 123_456)
193
+ # => [0.24, 0.13, 0.53, 0.023, 0.65, ...]
194
+ ```
28
195
 
29
196
  ## Contributing
30
197
 
31
198
  Once cloned, run the following commands to setup the test database.
32
199
 
33
- ```sh
200
+ ```bash
34
201
  cd ./spatial_stats
35
202
  bundle install
36
203
  cd test/dummy
@@ -40,7 +207,7 @@ rake db:migrate
40
207
 
41
208
  If you are getting an error, you may need to set the following environment variables.
42
209
 
43
- ```
210
+ ```bash
44
211
  $PGUSER # default "postgres"
45
212
  $PGPASSWORD # default ""
46
213
  $PGHOST # default "127.0.0.1"
@@ -50,7 +217,7 @@ $PGDATABASE # default "spatial_stats_test"
50
217
 
51
218
  If the dummy app is setup correctly, run the following:
52
219
 
53
- ```
220
+ ```bash
54
221
  cd ../..
55
222
  rake
56
223
  ```
@@ -59,7 +226,7 @@ This will run the tests. If they all pass, then your environment is setup correc
59
226
 
60
227
  Note: It is recommended to have GEOS installed and linked to RGeo. You can test this by running the following:
61
228
 
62
- ```
229
+ ```bash
63
230
  cd test/dummy
64
231
  rails c
65
232
 
@@ -71,8 +238,11 @@ RGeo::Geos.supported?
71
238
 
72
239
  - ~~Memoize expensive functions within classes~~
73
240
  - ~~Make star a parameter to getis-ord class~~
74
- - Add examples to docs
75
- - Create RDocs
241
+ - ~~Add examples/usage to docs~~
242
+ - ~~Create RDocs~~
243
+ - Refactor Global Moran and BVMoran
244
+ - Support non-numeric keys in WeightsMatrix/General refactor
245
+ - Write SparseMatrix C ext
76
246
 
77
247
  ## Future Work
78
248
 
@@ -80,16 +250,22 @@ RGeo::Geos.supported?
80
250
 
81
251
  - ~~Refactor stats to inherit an abstract class.~~
82
252
  - Change WeightsMatrix class and Stat classes to utilize sparse matrix methods.
253
+ - Split into two separate gems spatial_stats and spatial_stats-activerecord
83
254
 
84
255
  #### Weights
85
256
 
86
- - Add Kernel based weighting.
257
+ - Add Kernel based weighting
87
258
 
88
259
  #### Utils
89
260
 
90
261
  - Rate smoothing
91
262
  - Bayes smoothing
92
263
 
264
+ ### Global
265
+
266
+ - Geary class
267
+ - GetisOrd class
268
+
93
269
  #### Local
94
270
 
95
271
  - Join Count Statistic
@@ -9,10 +9,13 @@ require 'spatial_stats/queries'
9
9
  require 'spatial_stats/utils'
10
10
  require 'spatial_stats/weights'
11
11
 
12
+ ##
13
+ # SpatialStats is an ActiveRecord/PostGIS gem that provides descriptive spatial
14
+ # stats to your application.
12
15
  module SpatialStats
13
- def self.included(klass)
14
- puts 'here', klass
15
- # klass.extend(SpatialStats::Queries::Weights)
16
- end
16
+ # def self.included(klass)
17
+ # puts 'here', klass
18
+ # # klass.extend(SpatialStats::Queries::Weights)
19
+ # end
17
20
  # Your code goes here...
18
21
  end
@@ -1,6 +1,18 @@
1
1
  # frozen_string_literal: true
2
2
 
3
+ ##
4
+ # Extension to the Enumerable module
3
5
  module Enumerable
6
+ ##
7
+ # Standardize works with a numeric array and transforms each value so that
8
+ # the mean is 0 and the variance is 1.
9
+ # Formula is (x - mean)/stdev
10
+ #
11
+ # @example
12
+ # [1,2,3].standardize
13
+ # [-1.0, 0.0, 1.0]
14
+ #
15
+ # @return [Array] the standardized array
4
16
  def standardize
5
17
  # standardize is (variable - mean)/stdev
6
18
  m = mean
@@ -8,10 +20,27 @@ module Enumerable
8
20
  map { |v| (v - m) / std }
9
21
  end
10
22
 
23
+ ##
24
+ # Mean works with a numeric array and returns the arithmetic mean.
25
+ #
26
+ # @example
27
+ # [1,2,3].mean
28
+ # 2.0
29
+ #
30
+ # @return [Float] the arithmetic mean
11
31
  def mean
12
32
  sum / size.to_f
13
33
  end
14
34
 
35
+ ##
36
+ # Sample Variance works with a numeric array and returns the variance.
37
+ # Formula for variance is (x - mean)**2/(n - 1)
38
+ #
39
+ # @example
40
+ # [1,2,3].sample_variance
41
+ # 1.0
42
+ #
43
+ # @return [Float] the sample variance
15
44
  def sample_variance
16
45
  m = mean
17
46
  numerator = sum { |v| (v - m)**2 }
@@ -3,3 +3,18 @@
3
3
  require 'spatial_stats/global/stat'
4
4
  require 'spatial_stats/global/bivariate_moran'
5
5
  require 'spatial_stats/global/moran'
6
+
7
+ module SpatialStats
8
+ ##
9
+ # The Global module provides functionality for global spatial statistics.
10
+ # Global spatial statistics describe the entire dataset with one value,
11
+ # like how clustered the observations are across the entire dataset.
12
+ #
13
+ # All global classes define a +stat+ method that returns the described
14
+ # statistic and an +mc+ method that runs a permutation test determine a
15
+ # pseudo p-value for the statistic. Some also define +variance+ and
16
+ # +z_score+ methods that can be used to calculate p-values if the
17
+ # distribution is known.
18
+ module Global
19
+ end
20
+ end
@@ -1,9 +1,20 @@
1
1
  # frozen_string_literal: true
2
2
 
3
- # https://geodacenter.github.io/workbook/5b_global_adv/lab5b.html
4
3
  module SpatialStats
5
4
  module Global
5
+ ##
6
+ # BivariateMoran computes the correlation between a variable x and
7
+ # a spatially lagged variable y.
6
8
  class BivariateMoran < Stat
9
+ ##
10
+ # A new instance of BivariateMoran
11
+ #
12
+ # @param [ActiveRecord::Relation] scope
13
+ # @param [Symbol, String] x_field to query from scope
14
+ # @param [Symbol, String] y_field to query from scope
15
+ # @param [WeightsMatrix] weights to define relationship between observations in scope
16
+ #
17
+ # @return [BivariateMoran]
7
18
  def initialize(scope, x_field, y_field, weights)
8
19
  @scope = scope
9
20
  @x_field = x_field
@@ -12,7 +23,12 @@ module SpatialStats
12
23
  end
13
24
  attr_writer :x, :y
14
25
 
15
- def i
26
+ ##
27
+ # Computes the global spatial correlation of x against spatially lagged
28
+ # y.
29
+ #
30
+ # @return [Float]
31
+ def stat
16
32
  w = @weights.standardized
17
33
  y_lag = SpatialStats::Utils::Lag.neighbor_sum(w, y)
18
34
  numerator = 0
@@ -23,13 +39,22 @@ module SpatialStats
23
39
  denominator = x.sum { |xi| xi**2 }
24
40
  numerator / denominator
25
41
  end
42
+ alias i stat
26
43
 
44
+ ##
45
+ # The expected value of +#stat+.
46
+ # @see https://en.wikipedia.org/wiki/Moran%27s_I#Expected_value
47
+ # @return [Float]
27
48
  def expectation
28
49
  -1.0 / (@weights.n - 1)
29
50
  end
30
51
 
52
+ ##
53
+ # The variance of the bivariate spatial correlation.
54
+ # @see https://en.wikipedia.org/wiki/Moran%27s_I#Expected_value
55
+ #
56
+ # @return [Float]
31
57
  def variance
32
- # https://en.wikipedia.org/wiki/Moran%27s_I#Expected_value
33
58
  n = @weights.n
34
59
  wij = @weights.full
35
60
  w = wij.sum
@@ -47,16 +72,35 @@ module SpatialStats
47
72
  var_left - var_right
48
73
  end
49
74
 
75
+ ##
76
+ # Permutation test to determine a pseudo p-value of the computed I stat.
77
+ # Shuffles y values while holding x constant and recomputing I for each
78
+ # variation, then compares that I value to the computed one.
79
+ # The ratio of more extreme values to permutations is returned.
80
+ #
81
+ # @see https://geodacenter.github.io/glossary.html#perm
82
+ #
83
+ # @param [Integer] permutations to run. Last digit should be 9 to produce round numbers.
84
+ # @param [Integer] seed used in random number generator for shuffles.
85
+ #
86
+ # @return [Float]
50
87
  def mc(permutations = 99, seed = nil)
51
- # call super monte carlo for multivariate
52
88
  mc_bv(permutations, seed)
53
89
  end
54
90
 
91
+ ##
92
+ # Standardized variables queried from +x_field+.
93
+ #
94
+ # @return [Array]
55
95
  def x
56
96
  @x ||= SpatialStats::Queries::Variables.query_field(@scope, @x_field)
57
97
  .standardize
58
98
  end
59
99
 
100
+ ##
101
+ # Standardized variables queried from +y_field+.
102
+ #
103
+ # @return [Array]
60
104
  def y
61
105
  @y ||= SpatialStats::Queries::Variables.query_field(@scope, @y_field)
62
106
  .standardize
@@ -2,37 +2,62 @@
2
2
 
3
3
  module SpatialStats
4
4
  module Global
5
+ ##
6
+ # Moran's I statistic computes the spatial autocorrelation of variable x.
7
+ # It does this by computing a spatially lagged version of itself and
8
+ # comparing that with each observation based on the weights matrix.
5
9
  class Moran < Stat
10
+ ##
11
+ # A new instance of Moran
12
+ #
13
+ # @param [ActiveRecord::Relation] scope
14
+ # @param [Symbol, String] field to query from scope
15
+ # @param [WeightsMatrix] weights to define relationship between observations in scope
16
+ #
17
+ # @return [Moran]
6
18
  def initialize(scope, field, weights)
7
19
  super(scope, field, weights)
8
20
  end
9
21
  attr_writer :x
10
22
 
11
- def i
23
+ ##
24
+ # Computes the global spatial autocorrelation of x against a spatially
25
+ # lagged x.
26
+ #
27
+ # @return [Float]
28
+ def stat
12
29
  # compute's Moran's I. numerator is sum of zi * spatial lag of zi
13
30
  # denominator is sum of zi**2.
14
- # have to use row-standardized
15
- @i ||= begin
16
- w = @weights.standardized
17
- z_lag = SpatialStats::Utils::Lag.neighbor_sum(w, z)
18
- numerator = 0
19
- z.each_with_index do |zi, j|
20
- row_sum = zi * z_lag[j]
21
- numerator += row_sum
22
- end
23
-
24
- denominator = z.sum { |zi| zi**2 }
25
- numerator / denominator
31
+ # have to use row-standardized weights
32
+ w = @weights.standardized
33
+ z_lag = SpatialStats::Utils::Lag.neighbor_sum(w, z)
34
+ numerator = 0
35
+ z.each_with_index do |zi, j|
36
+ row_sum = zi * z_lag[j]
37
+ numerator += row_sum
26
38
  end
39
+
40
+ denominator = z.sum { |zi| zi**2 }
41
+ numerator / denominator
27
42
  end
43
+ alias i stat
28
44
 
45
+ ##
46
+ # The expected value of +#stat+.
47
+ # @see https://en.wikipedia.org/wiki/Moran%27s_I#Expected_value
48
+ #
49
+ # @return [Float]
29
50
  def expectation
30
51
  # -1/(n-1)
31
52
  -1.0 / (@weights.n - 1)
32
53
  end
33
54
 
55
+ ##
56
+ # The variance of the spatial correlation.
57
+ # @see https://en.wikipedia.org/wiki/Moran%27s_I#Expected_value
58
+ #
59
+ # @return [Float]
34
60
  def variance
35
- # https://en.wikipedia.org/wiki/Moran%27s_I#Expected_value
36
61
  n = @weights.n
37
62
  wij = @weights.full
38
63
  w = wij.sum
@@ -50,18 +75,43 @@ module SpatialStats
50
75
  var_left - var_right
51
76
  end
52
77
 
78
+ ##
79
+ # Permutation test to determine a pseudo p-value of the computed I stat.
80
+ # Shuffles x values recomputes I for each variation, then compares that I
81
+ # value to the computed one. The ratio of more extreme values to
82
+ # permutations is returned.
83
+ #
84
+ # @see https://geodacenter.github.io/glossary.html#perm
85
+ #
86
+ # @param [Integer] permutations to run. Last digit should be 9 to produce round numbers.
87
+ # @param [Integer] seed used in random number generator for shuffles.
88
+ #
89
+ # @return [Float]
53
90
  def mc(permutations = 99, seed = nil)
54
91
  super(permutations, seed)
55
92
  end
56
93
 
94
+ ##
95
+ # Values of the +field+ queried from the +scope+
96
+ #
97
+ # @return [Array]
57
98
  def x
58
99
  @x ||= SpatialStats::Queries::Variables.query_field(@scope, @field)
59
100
  end
60
101
 
102
+ # TODO: remove these last 2 methods and just standardize x.
103
+ ##
104
+ # Mean of x
105
+ #
106
+ # @return [Float]
61
107
  def zbar
62
108
  x.sum / x.size
63
109
  end
64
110
 
111
+ ##
112
+ # Array of xi - zbar for i: [0:n-1]
113
+ #
114
+ # @return [Array]
65
115
  def z
66
116
  x.map { |val| val - zbar }
67
117
  end
@@ -76,12 +126,12 @@ module SpatialStats
76
126
 
77
127
  def s2_calc(n, wij)
78
128
  s2 = 0
79
- (0..n - 1).each do |i|
129
+ (0..n - 1).each do |idx|
80
130
  left_term = 0
81
131
  right_term = 0
82
132
  (0..n - 1).each do |j|
83
- left_term += wij[i, j]
84
- right_term += wij[j, i]
133
+ left_term += wij[idx, j]
134
+ right_term += wij[j, idx]
85
135
  end
86
136
  s2 += (left_term + right_term)**2
87
137
  end
@@ -90,9 +140,9 @@ module SpatialStats
90
140
 
91
141
  def s1_calc(n, wij)
92
142
  s1 = 0
93
- (0..n - 1).each do |i|
143
+ (0..n - 1).each do |idx|
94
144
  (0..n - 1).each do |j|
95
- s1 += (wij[i, j] + wij[j, i])**2
145
+ s1 += (wij[idx, j] + wij[j, idx])**2
96
146
  end
97
147
  end
98
148
  s1 / 2