RubyGems - proptax - Versions diffs - 0.0.1 - Mend

proptax 0.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (20) hide show

checksums.yaml +7 -0
data/.gitignore +16 -0
data/.rspec +2 -0
data/.travis.yml +5 -0
data/Gemfile +4 -0
data/LICENSE +674 -0
data/README.md +161 -0
data/Rakefile +6 -0
data/bin/console +14 -0
data/bin/setup +8 -0
data/exe/proptax +3 -0
data/lib/R/filter_csv.R +68 -0
data/lib/proptax/cli.rb +89 -0
data/lib/proptax/generators/report/cherry-picked.Rmd +296 -0
data/lib/proptax/generators/report/default.Rmd +292 -0
data/lib/proptax/generators/report.rb +31 -0
data/lib/proptax/version.rb +3 -0
data/lib/proptax.rb +153 -0
data/proptax.gemspec +38 -0
metadata +136 -0

data/README.md ADDED Viewed

@@ -0,0 +1,161 @@
+# Proptax
+Process property assessment reports provided by the City of Calgary and automatically report and visualize on discrepencies in the data.
+I currently have three victories before Calgary's [Assessment Review Board](http://www.calgaryarb.ca/eCourtPublic/). Go to [TaxReformYYC](https://taxreformyyc.com) for more information.
+This software automatically generates the reports I submit as evidence before the ARB.
+# Setup
+`proptax` is a command line `ruby` program developed under Ubuntu 16.04. It is free to use and entirely open source, so it is probably deployable on MacOS and maybe Windows with some massaging. If you figure it out, please document the process and submit a pull request. I will gladly add your contribution to this software.
+## Dependencies
+`proptax` combines and coordinates the output of multiple open source resources. In broad terms, it requires the following packages to generate the reports:
+1. `gs`
+2. `tesseract`
+3. `enscript`
+4. `pandoc`
+5. `R`
+The following commands will install all third party dependencies:
+```
+sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys E298A3A825C0D65DFD57CBB651716619E084DAB9
+sudo add-apt-repository 'deb [arch=amd64,i386] https://cran.rstudio.com/bin/linux/ubuntu xenial/'
+sudo apt update
+sudo apt install -y ghostscript tesseract-ocr enscript pandoc r-base r-base-dev r-cran-scales libmagick++-dev mesa-common-dev libglu1-mesa-dev texlive-fonts-recommended texlive-latex-recommended
+```
+### R
+`R` does the bulk of the data processing. It has some dependencies that are not available from Ubuntu 16.04 (Xenial) PPAs. They need to be installed into the `R` environment directly.
+Execute the following to open the `R` command prompt:
+```
+R
+```
+Then, at the `>` prompt, execute the following `R` commands:
+```
+install.packages('knitr', dependencies = TRUE)
+install.packages('scales', dependencies = TRUE)
+install.packages('formattable', dependencies = TRUE)
+install.packages('ggplot2', dependencies = TRUE)
+```
+Assuming successful installation, you can exit `R` by holding `Ctrl-D`.
+## Install proptax
+`proptax` is a `ruby` program. As such, you need to [install ruby](https://www.digitalocean.com/community/tutorials/how-to-install-ruby-on-rails-with-rbenv-on-ubuntu-16-04).
+You install the latest release of `proptax` like this:
+```
+gem install proptax
+```
+# Usage
+`proptax` reports-on and visualizes the data contained in residential property reports provided by the City of Calgary. Your property report and those of your neighbours can be obtained at [assessmentsearch.calgary.ca](https://assessmentsearch.calgary.ca).
+Last year I made a whole series of super-boring [YouTube tutorials](https://www.youtube.com/playlist?list=PLkQAXLFkBnmiB8O06C2oGAoarBCVO7M9J) on how to collect and process your property data. The collection process has changed slightly, but [the first video](https://www.youtube.com/watch?v=m0zzsL0DYlI&list=PLkQAXLFkBnmiB8O06C2oGAoarBCVO7M9J&index=2) should point you in the right direction. You only get 50 reports per year for some reason, so use 'em all up (and send them to me)!
+TODO: More usage instructions to come...
+# Development
+Install third-party software as with _Setup > Dependencies_, above.
+Clone this repository:
+```
+git clone https://github.com/TaxReformYYC/report-generator-2018.git
+```
+Install `ruby` dependencies:
+```
+cd report-generator-2018
+bin/setup
+```
+### Test:
+```
+bundle exec rake spec
+```
+### To execute the `proptax` script within the development environment:
+```
+bundle exec exe/proptax
+```
+### To build the gem:
+```
+bundle exec rake build
+```
+The package will be found in the `pkg/` directory.
+### To install this gem onto your local machine, run:
+```
+bundle exec rake install
+```
+If that doesn't work, try this:
+```
+gem install pkg/proptax-0.1.0.gem # Note version number
+```
+Execute the program:
+```
+proptax
+```
+If installed correctly, you will see help instructions.
+### To release a new version:
+Update the version number in `version.rb`, and then run:
+```
+bundle exec rake release
+```
+This will create a `git` tag for the version, push `git` commits and tags, and push the `.gem` file to [rubygems.org](https://rubygems.org):
+# Contributing
+Bug reports and pull requests are welcome.
+## TODOs:
+- Speed up tests. Remove setup redundancies
+- Deploy auto CHANGELOG
+- DRY out `R` code
+- Deploy `tesseract` OCR on rasterized PDFs (as with Windows 7).
+- Custom report template documentation
+- Auto-install gem's third-party dependencies
+- Set up wiki for use on different operating systems
+- Dependencies require X11. It would be nice to run this on an Ubuntu 16.04 Server somehow
+Suggestions? Contribute or [donate](https://taxreformyyc.com/donate)!
+## Future:
+- Basic API for property data submission, collection, and retrieval
+# Licence
+GNU General Public License v3.0

data/Rakefile ADDED Viewed

@@ -0,0 +1,6 @@
+require "bundler/gem_tasks"
+require "rspec/core/rake_task"
+RSpec::Core::RakeTask.new(:spec)
+task :default => :spec

data/bin/console ADDED Viewed

@@ -0,0 +1,14 @@
+#!/usr/bin/env ruby
+require "bundler/setup"
+require "proptax"
+# You can add fixtures and/or initialization code here to make experimenting
+# with your gem easier. You can also use a different console, if you like.
+# (If you use this, don't forget to add pry to your Gemfile!)
+# require "pry"
+# Pry.start
+require "irb"
+IRB.start(__FILE__)

data/bin/setup ADDED Viewed

@@ -0,0 +1,8 @@
+#!/usr/bin/env bash
+set -euo pipefail
+IFS=$'\n\t'
+set -vx
+bundle install
+# Do any other automated setup that you need to do here

data/exe/proptax ADDED Viewed

@@ -0,0 +1,3 @@
+#!/usr/bin/env ruby
+require 'proptax/cli'
+Proptax::CLI.start

data/lib/R/filter_csv.R ADDED Viewed

@@ -0,0 +1,68 @@
+# How do this factors impact an assessment?
+unknownImpact <- c('Taxation.Status', 'Assessment.Class',
+                   'Property.Type', 'Property.Use', 'Valuation.Approach',
+                   'Market.Adjustment', 'Community', 'Market.Area', 'Sub.Neighbourhood.Code..SNC.',
+                   'Sub.Market.Area', 'Influences', 'Land.Use.Designation', 'Building.Count',
+                   'Building.Type.Structure', 'Year.of.Construction', 'Quality', 'Basement.Suite',
+                   'Walkout.Basement', 'Garage.Type', 'Fireplace.Count',
+                   'Constructed.On.Original.Foundation', 'Modified.For.Disabled',
+                   'Old.House.On.New.Foundation', 'Basementless', 'Penthouse')
+# These factors don't directly affect the assessment
+noImpact <- c('Roll.Number', 'Location.Address')
+# Load data
+csvFile <- commandArgs(trailingOnly = TRUE)
+data <- read.csv(csvFile, header=TRUE)
+# Identify common factors
+headers <- c()
+identical <- c()
+for (col in colnames(data)) {
+  values = data[,col][!duplicated(data[,col])]
+  if (length(values) == 1) {
+    headers <- append(headers, col)
+    identical <- append(identical, toString(values))
+  }
+}
+commonFactors <- data.frame(headers, identical)
+# Remove common factors
+data <- data[,!(names(data) %in% commonFactors$headers)]
+# Identify unknown factors
+unknownFactors <- names(data[,names(data) %in% unknownImpact])
+# Remove unknown factors from data set
+data <- data[,!(names(data) %in% unknownFactors)]
+# Remove irrelevant factors from data set, but preserve address for plot labels
+rowNames <- data[,noImpact[2]]
+data <- data[,!(names(data) %in% noImpact)]
+# Label assessment records
+rownames(data) <- rowNames
+# Sum each property's lot size and total developed space
+areaTotals <- rowSums(data[,-1])
+# Isolate all assessed values
+assessedValues <- data[,1]
+# Remove the street names from the addresses
+houseNumbers <- as.numeric(gsub("[^\\d]+", "", rowNames, perl=TRUE))
+# Plot the best fit regression line
+reg <- lm(assessedValues~areaTotals)
+# Plot distances between points and the regression line
+assessedDifferences <- residuals(reg)
+adjustedValues <- predict(reg)
+# Reconcile adjusted property values
+discrepancies <- round(assessedDifferences/assessedValues*100, 2)
+adjustedProperties <- data.frame(houseNumbers, assessedValues, adjustedValues, assessedDifferences, discrepancies)
+print(adjustedProperties)

data/lib/proptax/cli.rb ADDED Viewed

@@ -0,0 +1,89 @@
+require 'thor'
+require 'proptax'
+require 'proptax/generators/report'
+module Proptax
+  class CLI < Thor
+    check_unknown_options!
+    # 2016-2-29 http://stackoverflow.com/questions/14346285/how-to-make-two-thor-tasks-share-options
+    shared_options = [:ylimit, {
+                        :type => :string,
+                        :default => "10000",
+                        :description => "Expand y-axis limits"}]
+    report_options = [:template, {
+                        :type => :string,
+                        :default => "default",
+                        :description => "Apply specific template: [default, cherry-picked]"}]
+#    consolidate_options = [:consolidate, {
+#                            :type => :boolean,
+#                            :default => true,
+#                            :description => "Consolidate the PDFs before generating the report"}]
+    desc "consolidate DIR", "Outputs CSV data extracted from 2018 property assessment reports"
+    def consolidate(dir)
+      Proptax::Consolidator.process(dir)
+    end
+    desc "reports CSV_FILE", "Generate assessment reports"
+    method_option *shared_options
+    method_option *report_options
+#    method_option *consolidate_options
+    def reports(dir)
+      csv_file = 'consolidated.csv'
+      if options.consolidate?
+        `proptax consolidate #{dir} > #{csv_file}`
+      else
+        csv_file = "#{dir}/consolidated.csv"
+      end
+      Proptax::Generators::Report.start([csv_file, options])
+      generate_material("reports")
+    end
+    desc "auto DIR", "Automatically create CSV file and reports"
+    method_option *shared_options
+    method_option *report_options
+    def auto(dir)
+      `proptax consolidate #{dir} > consolidated.csv`
+      Proptax::Generators::Report.start(['consolidated.csv', options])
+      generate_material("reports")
+    end
+    desc "filter CSV_FILE", "Calculate and display assessment discrepancies"
+    method_option :csv,
+                    :type => :boolean,
+                    :default => false,
+                    :description => "Output in CSV format"
+    method_option :header,
+                    :type => :boolean,
+                    :default => true,
+                    :description => "Include header row in CSV format"
+    def filter(csv)
+      data_frame = `Rscript "#{__dir__}"/../R/filter_csv.R "#{csv}"`
+      if options[:csv]
+        lines = data_frame.split("\n")
+        # Print header
+        puts lines[0].squeeze(' ').split(' ').to_csv if options[:header]
+        # Print data (minus R-inserted integer row name)
+        lines[1..-1].each do |line|
+          puts line.squeeze(' ').split(' ')[1..-1].to_csv
+        end
+      else
+        puts data_frame
+      end
+    end
+    no_commands do
+      def generate_material(dir)
+        Dir.foreach(dir) do |file|
+          if /\.Rmd/.match(file)
+            puts "#{file}"
+            `cd "#{dir}" && Rscript -e "library('knitr'); knit('#{file}');"`
+            file.gsub!('.Rmd', '.md')
+            `cd "#{dir}" && Rscript -e "library('knitr'); pandoc('#{file}', format = 'latex');"`
+          end
+        end
+      end
+    end
+  end
+end

data/lib/proptax/generators/report/cherry-picked.Rmd ADDED Viewed

@@ -0,0 +1,296 @@
+---
+title: 2017 Property Assessment Analysis
+author: Prepared by reports@taxreformyyc.com
+geometry: margin=1.5cm
+---
+```{r loadLibraries, echo=FALSE, message=FALSE}
+# Load the required libraries
+library(knitr)
+library(scales)
+library(formattable)
+library(ggplot2)
+```
+```{r defineConstants, echo=FALSE}
+# Constants
+address <- "<%= address %>"
+myAssessedValue <- <%= assessed_value %>
+csvFile <- "<%= csv_file %>"
+# Get the house number and street name
+#myHouseNumber <- gsub("[^\\d]+", "", address, perl=TRUE)
+m <- regexpr("^\\d+", address, perl=TRUE)
+myHouseNumber <- regmatches(address, m)
+myStreetName <- gsub(".*[\\d]", "", address, perl=TRUE)
+# How do this factors impact an assessment?
+unknownImpact <- c('Valuation.Approach', 'Assessment.Class',
+                   'Property.Type', 'Property.Use', 'Taxation.Status',
+                   'Community', 'Sub.Neighbourhood.Code..SNC.', 'Market.Area',
+                   'Sub.Market.Area', 'Land.Use.Designation',
+                   'Building.Count', 'Building.Type.Structure',
+                   'Year.of.Construction', 'Quality', 'Basement.Suite',
+                   'Walkout.Basement', 'Garage.Type', 'Fireplace',
+                   'Influences', 'Market.Adjustment')
+# These factors are informational and don't affect the assessment
+noImpact <- c('Roll.Number', 'Location.Address')
+```
+```{r loadData, echo=FALSE}
+# Load data
+data <- read.csv(csvFile, header=TRUE)
+```
+```{r getMetaData, echo=FALSE}
+streetAddresses <- data[,noImpact[2]]
+# Remove the street names from the addresses
+#houseNumbers <- as.numeric(gsub("[^\\d]+", "", streetAddresses, perl=TRUE))
+m <- regexpr("^\\d+", streetAddresses, perl=TRUE)
+houseNumbers <- as.numeric(regmatches(streetAddresses, m))
+```
+\center
+`r address`
+\flushleft
+# Synopsis
+This analysis pertains to the property located at **`r address`**.
+It documents the treatment of the assessment data provided by the City of
+Calgary for the pertinent property and those deemed comparable:
+`r kable(data[order(data$Location.Address),]$Location.Address,
+         col.names=c('Comparable Properties'), align=c('l'))`
+The data under investigation was obtained from
+[assessmentsearch.calgary.ca](https://assessmentsearch.calgary.ca) and
+is presented alongside this document in a consolidated CSV file.
+# Approach Overview
+The data investigated in this analysis consists of properties chosen for their
+similar features and proximity to one another. The number of properties chosen
+was maximized to ensure a fair representation of valuations and to support the
+integrity of this report and its conclusion.
+Given that there are many factors contained in the data whose precise impact on
+assessed values are unknown, commonalities are identified and omitted from
+consideration, as all such factors should have identical impact on the
+final assessment.
+Factors that vary are identified and presented for consideration, as
+transparency and integrity is of the utmost importance. Again, the precise
+impact of these factors on the final assessment is unknown, as the weights
+assigned by the City of Calgary are not divulged in the assessment data they
+provide.
+Having acknowledged the factors that cannot easily be quantified, the focus
+turns to the properties' lot sizes, total developed area, and assessed values.
+By visualizing the relationship between these quantifiable factors, the
+pertinent property's assessed value is contrasted with those of the selected
+properties. The conclusion of this analysis is drawn from the underlying data.
+# Identify common factors
+Many of the factors that impact the assessments are identical. This data can
+safely be removed from consideration, as the impact on the assessed values
+should be the same for all the properties under investigation.
+```{r commonFactors, echo=FALSE}
+# Identify common factors
+headers <- c()
+identical <- c()
+displayHeaders <- c()
+for (col in colnames(data)) {
+  values = data[,col][!duplicated(data[,col])]
+  if (length(values) == 1) {
+    headers <- append(headers, col)
+    displayHeaders <-append(displayHeaders, gsub("\\.", " ", col, perl=TRUE))
+    identical <- append(identical, toString(values))
+  }
+}
+commonFactors <- data.frame(headers, displayHeaders, identical)
+```
+Here, **`r length(commonFactors$headers)`** common factors can safely be
+removed from the data set:
+`r kable(data.frame(commonFactors$displayHeaders, commonFactors$identical),
+         col.names=c('Factors', 'Identical Values'), align=c('l', 'r'))`
+```{r removeCommonFactors, echo=FALSE}
+# Remove common factors
+data <- data[,!(names(data) %in% commonFactors$headers)]
+```
+# Identify unknown and non-impacting factors
+Of the **`r length(colnames(data))`** remaining columns, some cannot be
+quantified. Others certainly impact the assessed value of a property, but the
+assessment data provided by the City of Calgary does not reveal to what extent.
+## Non-impacting factors
+```{r noImpactDisplayHeaders, echo=FALSE}
+# Remove the dots from the header name
+noImpactDisplayHeaders <- gsub("\\.", " ", noImpact, perl=TRUE)
+```
+These factors cannot be quantified and are administrative in purpose:
+`r kable(noImpactDisplayHeaders, col.names=c('Non-Impacting Factors'))`
+```{r removeIrrelevantFactors, echo=FALSE}
+data <- data[,!(names(data) %in% noImpact)]
+```
+These are removed and the remaining **`r length(colnames(data))`** columns are
+carried forward.
+## Unknown factors
+The impact these remaining columns have on assessment values is unknown:
+```{r identifyUnknowns, echo=FALSE}
+# Identify unknown factors
+unknownFactors <- names(data[,names(data) %in% unknownImpact])
+```
+`r kable(gsub("\\.", " ", unknownFactors), col.names=c('Unknown Factors'))`
+The variability within these unknown columns is presented here in the interest
+of transparency:
+```{r consolidateUnknownFactors, echo=FALSE}
+consolidatedUnknownFactors <- data[,(names(data) %in% c(unknownFactors))]
+rownames(consolidatedUnknownFactors) <- houseNumbers
+```
+`r kable(consolidatedUnknownFactors[order(as.numeric(row.names(consolidatedUnknownFactors))),],
+         align=c(rep('c', length(unknownFactors))),
+         row.names=TRUE,
+         col.names=gsub("\\.", " ", unknownFactors))`
+These undoubtedly have an impact on the valuation, but their precise weighting
+and significance are not presented in the assessment data provided by the City
+of Calgary. As such, they are removed from the dataset.
+```{r removeUnknownFactors, echo=FALSE}
+# Remove unknown factors from data set
+data <- data[,!(names(data) %in% unknownFactors)]
+```
+The remaining **`r length(colnames(data))`** columns contain the following data:
+```{r addRowNamesToData, echo=FALSE}
+# Add row names to data
+rownames(data) <- houseNumbers
+```
+`r kable(data[order(as.numeric(row.names(data))),], row.names=TRUE, col.names=gsub("\\.", " ", colnames(data)))`
+# Visualization
+The raw data presented above is summarized in Figure 1. It
+illustrates the disparity between the assessed property values. The pertinent
+property is coloured red.
+The blue line running through the graph is _best fit_ for the visualized model.
+It serves as a predictor, or indicator, as to where the properties in question
+should be positioned.
+The pertinent property's overassessment is determined by measuring the distance
+between the red point and the blue line.
+```{r adjustValues, echo=FALSE}
+# Sum each property's lot size and total developed space
+areaTotals <- rowSums(data[,-1])
+# Isolate all assessed values
+assessedValues <- data[,1]
+# Plot the best fit regression line
+reg <- lm(assessedValues~areaTotals)
+# Plot distances between points and the regression line
+assessedDifferences <- residuals(reg)
+adjustedValues <- predict(reg)
+# Reconcile adjusted property values
+adjustedProperties <- data.frame(houseNumbers, adjustedValues, assessedDifferences)
+```
+```{r generatePlot, fig.cap=paste(address, "Overassessment", " "), fig.width=12, echo=FALSE}
+plot.title <- paste(address, "and Comparable Properties", sep=" ")
+plot.subtitle = 'Current Assessed Property Values'
+ggplot(data, aes(x=areaTotals, y=assessedValues)) +
+    # Plot title
+    ggtitle(bquote(atop(bold(.(plot.title)), atop(italic(.(plot.subtitle)), "")))) +
+    theme(plot.title=element_text(size=24, hjust = 0.5)) +
+    # Axis labels
+    ylab("Assessed House Values on your Street") +
+    xlab("Total House and Lot Size (Sq. Feet)") +
+    theme(axis.title.x=element_text(size=18, face="bold", margin=margin(20,0,0,0)),
+          axis.title.y=element_text(size=18, face="bold", margin=margin(0,20,0,0))) +
+    # Axis tick labels
+    scale_x_continuous(labels=comma) +
+<% if opts.ylimit? %>
+    scale_y_continuous(labels=dollar, breaks=pretty_breaks(n=10),
+                       limits=c(min(assessedValues)-<%= opts.ylimit %>, max(assessedValues)+<%= opts.ylimit %>)) +
+<% else %>
+    scale_y_continuous(labels=dollar, breaks=pretty_breaks(n=10)) +
+<% end %>
+    # Scatter plot points
+    geom_point(shape=ifelse(assessedValues==myAssessedValue, 16, 1),
+               size=ifelse(assessedValues==myAssessedValue, 5, 4),
+               colour=ifelse(assessedValues==myAssessedValue, "red", "blue")) +
+    # Point labels
+    geom_text(aes(label=houseNumbers), hjust=0.5, vjust=-2, size=5) +
+    geom_text(aes(label=paste("$", accounting(assessedValues, format="d"), sep="")),
+              hjust=0.5, vjust=-1.2) +
+    # Best fit line
+    geom_smooth(method=lm)
+```
+# Conclusion
+The data investigated in this analysis describes the factors considered in
+assessing the property located at `r address`.
+It was collected and provided by the City of Calgary. This report set out to
+quantify the disparity between the pertinent property and similar properties
+in the the neighbourhood.
+Common factors were identified and eliminated from the analysis. Similarly,
+varying factors were identified, catalogued, and removed from consideration. The
+City of Calgary's property reports do not describe how these characteristic
+features are quantified and weighted in determining a property's assessed
+value. As such, they could not be included.
+The conclusions that follow are drawn from comparing lot sizes, total developed
+area, and assessed values. The underlying data and the overall approach have
+been presented in full.
+## Overvaluation: `r currency(adjustedProperties[houseNumbers==myHouseNumber,]$assessedDifferences)`
+This investigation compared the assessed values with lot sizes and developed
+square footage. It has revealed that that the pertinent property is overvalued
+by
+**`r currency(adjustedProperties[houseNumbers==myHouseNumber,]$assessedDifferences)`**.
+## Corrected Assessed Value: `r currency(adjustedProperties[houseNumbers==myHouseNumber,]$adjustedValues)`
+In order to bring the pertinent property's assessed value in line with those of
+similar properties in the neighbourhood, it must be reassessed at
+**`r currency(adjustedProperties[houseNumbers==myHouseNumber,]$adjustedValues)`**.