PyPI - morpc - Versions diffs - 0.3.2__tar.gz → 0.3.4__tar.gz - Mend

morpc 0.3.2tar.gz → 0.3.4tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (57) hide show

{morpc-0.3.2 → morpc-0.3.4}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: morpc
-Version: 0.3.2
+Version: 0.3.4
 Summary: Data managment tools used by MORPC
 Author-email: MORPC data team <dataandmaps@morpc.org>
 License-Expression: MIT

morpc-0.3.4/docs/05-morpc-geos-demo.ipynb ADDED Viewed

@@ -0,0 +1,411 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "2e7c0e95-72d3-4b78-85c1-7679c8b50d75",
+   "metadata": {},
+   "source": [
+    "# Spatial Data Tools"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8f0a9134-295d-4a94-b91c-1f152350634d",
+   "metadata": {
+    "editable": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
+   "source": [
+    "## Load spatial data"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "63ec4df4-eec2-475d-936a-e93db94f52a3",
+   "metadata": {},
+   "source": [
+    "Often we want to make a copy of some input data and work with the copy, for example to protect the original data or to create an archival copy of it so that we can replicate the process later.  With tabular data this is simple, but with spatial data it can be tricky.  Shapefiles actually consist of up to six files, so it is necessary to copy them all.  Geodatabases may contain many layers in addition to the one we care about.  The `load_spatial_data()` function simplifies the process of reading the data and (optionally) making an archival copy. It has three parameters:\n",
+    "  - `sourcePath` - The path to the geospatial data. It may be a file path or URL. In the case of a Shapefile, this should point to the .shp file or a zipped file that contains all of the Shapefile components. You can point to other zipped contents as well, but see caveats below.\n",
+    "  - `layerName` (required for GPKG and GDB, optional for SHP) - The name of the layer that you wish to extract from a GeoPackage or File Geodatabase.  Not required for Shapefiles, but may be specified for use in the archival copy (see below)\n",
+    "  - `driverName` (required for zipped data or data with non-standard file extension) - which [GDAL driver](https://gdal.org/drivers/vector/index.html) to use to read the file. Script will attempt to infer this from the file extension, but you must specify it if the data is zipped, if the file extension is non-standard, or if the extension cannot be determined from the path (e.g. if the path is an API query)\n",
+    "  - `archiveDir` (optional) - The path to the directory where a copy of a data should be archived.  If this is specified, the data will be archived in this location as a GeoPackage.  The function will determine the file name and layer name from the specified parameters, using generic values if necessary.\n",
+    "  - `archiveFileName` (optional) - If `archiveDir` is specified, you may use this to specify the name of the archival GeoPackage.  Omit the extension.  If this is unspecified, the function will assign the file name automatically using a generic value if necessary.\n",
+    "  \n",
+    "The following example loads data from the MORPC Mid-Ohio Open Data website, however you can also load data from a local path or network drive."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "344971d0-83be-4410-b2d9-ee1ae90250cf",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "import geopandas as gpd\n",
+    "import morpc\n",
+    "import os"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "id": "b8c36fa4-b7dd-43d4-9574-7a640d62a22b",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "morpc.load_spatial_data | INFO | Loading spatial data from location: https://www2.census.gov/geo/tiger/TIGER2024/METDIV/tl_2024_us_metdiv.zip\n",
+      "morpc.load_spatial_data | INFO | Attempting to load data from Census FTP site. Using wget to archive file.\n",
+      "morpc.load_spatial_data | WARNING | Data from Census FTP must be temp saved. Using ./temp_data.\n",
+      "morpc.load_spatial_data | INFO | Using driver Census Shapefile as specified by user.\n",
+      "morpc.load_spatial_data | INFO | Reading spatial data...\n",
+      "morpc.load_spatial_data | INFO | File name is unspecified.  Will infer file name from source path.\n",
+      "morpc.load_spatial_data | INFO | Using automatically-selected file name: tl_2024_us_metdiv\n",
+      "morpc.load_spatial_data | INFO | Layer name is unspecified. Using automatically-selected layer name: tl_2024_us_metdiv\n",
+      "morpc.load_spatial_data | INFO | Creating archival copy of geospatial layer at ./temp_data\\tl_2024_us_metdiv.gpkg, layer tl_2024_us_metdiv\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "C:\\Users\\jinskeep\\morpc_venv\\Lib\\site-packages\\pyogrio\\raw.py:198: RuntimeWarning: driver ESRI Shapefile does not support open option DRIVER\n"
+     ]
+    }
+   ],
+   "source": [
+    "url = 'https://www2.census.gov/geo/tiger/TIGER2024/METDIV/tl_2024_us_metdiv.zip'\n",
+    "gdf = morpc.load_spatial_data(url, archiveDir='./temp_data')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "3573a45b-048b-4223-9c99-92219ab4db59",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "# Create a directory to store the archival data (for demonstration purposes only)\n",
+    "if not os.path.exists(\"./temp_data\"):\n",
+    "    os.makedirs(\"./temp_data\")\n",
+    "\n",
+    "# Load the data and create an archival copy\n",
+    "gdf = morpc.load_spatial_data(\n",
+    "    sourcePath=\"https://opendata.arcgis.com/api/v3/datasets/e42b50fbd17a47739c2a7695778c498e_17/downloads/data?format=shp&spatialRefId=3735&where=1%3D1\", \n",
+    "    layerName=\"MORPC MPO Boundary\",\n",
+    "    driverName=\"ESRI Shapefile\",\n",
+    "    archiveDir=\"./temp_data\"\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "534d5386-9d8c-490e-941b-92c0b67ec65a",
+   "metadata": {},
+   "source": [
+    "Let's take a look at the data and make sure it loaded correctly."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "f8ee189b-6326-4038-b193-2e3184a09ac6",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "gdf.drop(columns=\"Updated\").explore() ## avoid datetime column JSON error"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ba2a1e26-6d2f-469d-a71c-4a096bfeffc5",
+   "metadata": {},
+   "source": [
+    "Now let's read the archival copy and make sure it looks the same.  We'll use the `load_spatial_data()` function again, but this time we won't make an archival copy."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "d2cc1a3f-9f78-4044-86de-6f3a6502b6de",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "gdfArchive = morpc.load_spatial_data(\"./temp_data/MORPC MPO Boundary.gpkg\", layerName=\"MORPC MPO Boundary\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4ea6c47d-30fc-4e72-96b6-338666130bdd",
+   "metadata": {
+    "editable": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
+   "source": [
+    "# Assign geographic identifiers"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "29c482fe-76bc-4ae1-84cf-48e60dad52be",
+   "metadata": {},
+   "source": [
+    "Sometimes we have a set of locations and we would like to know what geography (county, zipcode, etc.) they fall in. The `assign_geo_identifiers()` function takes a set of georeference points and a list of geography levels and determines for each level which area each point falls in.  The function takes two parameters:\n",
+    "  - `points` - a GeoPandas GeoDataFrame consisting of the points of interest\n",
+    "  - `geographies` - A Python list of one or more strings in which each element corresponds to a geography level. You can specify as many levels as you want from the following list, however note that the function must download the polygons and perform the analysis for each level so if you specify many levels it may take a long time.\n",
+    "    - \"county\" - County (Census TIGER)\n",
+    "    - \"tract\" - *Not currently implemented*\n",
+    "    - \"blockgroup\" - *Not currently implemented*\n",
+    "    - \"block\" - *Not currently implemented*\n",
+    "    - \"zcta\" - *Not currently implemented*\n",
+    "    - \"place\" - Census place (Census TIGER)\n",
+    "    - \"placecombo\" - *Not currently implemented*\n",
+    "    - \"juris\" - *Not currently implemented*\n",
+    "    - \"region15County\" - *Not currently implemented*\n",
+    "    - \"region10County\" - *Not currently implemented*\n",
+    "    - \"regionCORPO\" - *Not currently implemented*\n",
+    "    - \"regionMPO\" - *Not currently implemented*\n",
+    "\n",
+    "**NOTE:** Many of the geography levels are not currently implemented.  They are being implemented as they are needed.  If you need one that has not yet been implemented, please contact Adam Porr (or implement it yourself)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cad973e5-cf05-4b46-8d18-a3fddd07e93f",
+   "metadata": {},
+   "source": [
+    "In the following example, we will assign labels for the \"county\" and \"place\" geography levels to libraries in MORPC's Points of Interest layer.  First we'll download just the library locations from Mid-Ohio Open Data using the ArcGIS REST API."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "2ee41a03-52bc-4e14-843a-54f70d73982a",
+   "metadata": {
+    "editable": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "url = \"https://services1.arcgis.com/EjjnBtwS9ivTGI8x/arcgis/rest/services/Points_of_Interest/FeatureServer/0/query?outFields=*&where=%22type%22=%27Library%27&f=geojson\"\n",
+    "librariesRaw = gpd.read_file(url)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1db0a93b-5f7a-4a6a-ade1-d25b33477da8",
+   "metadata": {},
+   "source": [
+    "The data incudes a bunch of fields that we don't need.  For clarity, extract only the relevant fields."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "61f3f15d-072d-486f-abf3-6fbdd4f71fda",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "libraries = librariesRaw.copy().filter(items=['NAME', 'ADDRESS','geometry'], axis=\"columns\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "c6d75af5-6020-4e04-8245-298ddcf7ebb0",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "libraries.head()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "776f3718-24d5-4e56-b51b-89b796d9794b",
+   "metadata": {},
+   "source": [
+    "Let's take a look at the library locations."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "5f1c2dee-3fbb-4515-b04a-844efcab875d",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "libraries.explore(style_kwds={\"radius\":4})"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8ade8d86-39df-4f3f-8856-08c03d1f18a9",
+   "metadata": {},
+   "source": [
+    "Use the `assign_geo_identifiers()` function to iterate through the requested geography levels (in this case \"county\" and \"place\"), labeling each point with the identifier of the geography in each level where the point is located."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a37eb22f-3cbd-4681-8715-a331b1a16703",
+   "metadata": {},
+   "source": [
+    "## Assign Geographic Identifiers"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9ef7876e-8f9d-4539-99b1-02056e7d8251",
+   "metadata": {},
+   "source": [
+    "This fuction is broken due to changes at the Census which prevents loading TigerLINE files from FTP site."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "58025778-8cdc-4e25-9cd9-45895f165673",
+   "metadata": {
+    "editable": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "librariesEnriched = morpc.assign_geo_identifiers(libraries, [\"county\",\"place\"])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "af64d50e-779e-404c-92af-978205aa7f61",
+   "metadata": {
+    "editable": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
+   "source": [
+    "Note that two columns have been added to the dataframe, one that contains the identifier for the county the library is located in and one that contains the identifier for the place.  "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "c4a8debd-a9e8-4973-8d44-57012206449b",
+   "metadata": {
+    "editable": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "librariesEnriched.head()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ffb89cbd-81f4-44f9-93da-f3a656e44323",
+   "metadata": {
+    "editable": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
+   "source": [
+    "Let's take a look at libraries, symbolizing each according to the county where it is located."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "576aeba6-9978-4b1b-b846-46d4db0d8087",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "librariesEnriched.explore(column=\"id_county\", style_kwds={\"radius\":4})"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "614c48cc-36cc-44be-8211-a96ec3c7577d",
+   "metadata": {},
+   "source": [
+    "Let's take another look, this time symbolizing each library according to the place where it is located.  The legend has been suppressed because there are too many unique values, but you can hover over each point to see the place identifier that has been assigned to it."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "45ffcdad-dfa2-4e8d-b684-ffeb6fb1b12d",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "librariesEnriched.explore(column=\"id_place\", style_kwds={\"radius\":4}, legend=True)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "dcf66a7e-9f88-48c0-ad7c-e2ebf2876809",
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.12.10"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}

{morpc-0.3.2 → morpc-0.3.4}/docs/07-morpc-census-demo.ipynb RENAMED Viewed

@@ -614,14 +614,6 @@
    "source": [
     "dim_table.loc[dim_table['Variable type'] == 'Estimate'].head()"
    ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "730020a9-0f68-4441-a20b-1c2f1abaff74",
-   "metadata": {},
-   "outputs": [],
-   "source": []
   }
  ],
  "metadata": {

{morpc-0.3.2 → morpc-0.3.4}/morpc/__init__.py RENAMED Viewed

@@ -1,4 +1,4 @@
-__version__ = "0.3.2"
+__version__ = "0.3.4"
 from .morpc import *
 import morpc.frictionless

{morpc-0.3.2 → morpc-0.3.4}/morpc/census/census.py RENAMED Viewed

@@ -10,14 +10,20 @@ ACS_ID_FIELDS = {
         {"name":"SUMLEVEL", "type":"string", "description":"Code representing the geographic summary level for the data"},
         {"name":"STATE","type":"string","description":"Unique identifier for state in which geography is located"},
         {"name":"COUNTY","type":"string","description":"Unique identifier for county in which geography is located"},
-        {"name":"TRACT","type":"string","description":"Unique identifier for tract in which geography is located"}
+        {"name":"TRACT","type":"string","description":"Unique identifier for tract in which geography is located"}
     ],
     "tract": [
         {"name":"GEO_ID", "type":"string", "description":"Unique identifier for geography"},
         {"name":"SUMLEVEL", "type":"string", "description":"Code representing the geographic summary level for the data"},
         {"name":"STATE","type":"string","description":"Unique identifier for state in which geography is located"},
         {"name":"COUNTY","type":"string","description":"Unique identifier for county in which geography is located"}
-    ],
+    ],
+    "county subdivision": [
+        {"name":"GEO_ID", "type":"string", "description":"Unique identifier for geography"},
+        {"name":"SUMLEVEL", "type":"string", "description":"Code representing the geographic summary level for the data"},
+        {"name":"STATE","type":"string","description":"Unique identifier for state in which geography is located"},
+        {"name":"COUNTY","type":"string","description":"Unique identifier for county in which geography is located"}
+    ],
     "county": [
         {"name":"GEO_ID", "type":"string", "description":"Unique identifier for geography"},
         {"name":"SUMLEVEL", "type":"string", "description":"Code representing the geographic summary level for the data"},
@@ -146,13 +152,13 @@ ACS_AGEGROUP_SORT_ORDER = {
     '80 to 84 years': 17,
     '85 years and over': 18
 }
 def api_get(url, params, varBatchSize=20, verbose=True):
     """
     api_get() is a low-level wrapper for Census API requests that returns the results as a pandas dataframe. If necessary, it
     splits the request into several smaller requests to bypass the 50-variable limit imposed by the API.  The resulting dataframe
     is indexed by GEOID (regardless of whether it was requested) and omits other fields that are not requested but which are returned
-    automatically with each API request (e.g. "state", "county")
+    automatically with each API request (e.g. "state", "county")
     Parameters
     ----------
@@ -278,6 +284,39 @@ def api_get(url, params, varBatchSize=20, verbose=True):
 #         |--------------|----------------|---------------------|-------------------------|
 #         | B25127_004E  | Owner occupied | Built 2020 or later | 1, detached or attached |
 #
+def acs_variables_by_group(groupNumber, acsYear, acsSurvey):
+    """
+    Get a list of all variables that are in a census variable group.
+    Parameters
+    ----------
+    groupNumber : str
+        The group number to search for within the variables table. ie. B11001
+    acsYear : str
+        The year of the survey. ie. 2023
+    acsSurvey : str
+        The acs survey to get variables for. ie. 1 or 5
+    Returns
+    -------
+    dict
+        A dict of the variables in the group and related fields.
+    """
+    import requests
+    import json
+    r = requests.get(f'https://api.census.gov/data/{acsYear}/acs/acs{acsSurvey}/variables.json')
+    json = r.json()
+    variables = {}
+    for variable in json['variables']:
+        if json['variables'][variable]['group'] == groupNumber:
+            variables[variable] = json['variables'][variable]
+    return variables
 def acs_label_to_dimensions(labelSeries, dimensionNames=None):
     """
     acs_label_to_dimensions(labelSeries, dimensionNames=None)

{morpc-0.3.2 → morpc-0.3.4}/morpc/frictionless/frictionless.py RENAMED Viewed

@@ -36,7 +36,7 @@ def name_to_desc_map(schema):
 # Given a dataframe and the Frictionless Schema object (see load_schema), recast each of the fields in the
 # dataframe to the data type specified in the schema.
-def cast_field_types(df, schema, forceInteger=False, handleMissingFields="error", verbose=True):
+def cast_field_types(df, schema, forceInteger=False, forceInt64=False, handleMissingFields="error", verbose=True):
     import frictionless
     import pandas as pd
     import shapely
@@ -64,8 +64,13 @@ def cast_field_types(df, schema, forceInteger=False, handleMissingFields="error"
         # the field must be cast as "Int64" instead.
         if((fieldType == "int") or (fieldType == "integer")):
             try:
-                # Try to cast the field as an "int".  This will fail if nulls are present.
-                outDF[fieldName] = outDF[fieldName].astype("int")
+                if(forceInt64 == True):
+                    # Cast all integer fields as Int64 whether this is necessary or not.  This is useful when trying to merge
+                    # dataframes with mixed int32 and Int64 values.
+                    outDF[fieldName] = outDF[fieldName].astype("Int64")
+                else:
+                    # Try to cast the field as an "int".  This will fail if nulls are present.
+                    outDF[fieldName] = outDF[fieldName].astype("int")
             except:
                 try:
                     # Try to cast as "Int64", which supports nulls. This will fail if the fractional part is non-zero.
@@ -472,7 +477,7 @@ def validate_resource(resourcePath, verbose=True):
             print(results)
         return False
-def load_data(resourcePath, archiveDir=None, validate=False, verbose=True):
+def load_data(resourcePath, archiveDir=None, validate=False, forceInteger=False, forceInt64=False, verbose=True):
     """Often we want to make a copy of some input data and work with the copy, for example to protect
     the original data or to create an archival copy of it so that we can replicate the process later.
     The `load_data()` function simplifies the process of reading the data and
@@ -488,6 +493,15 @@ def load_data(resourcePath, archiveDir=None, validate=False, verbose=True):
     validate : bool
         Optional. If True, the resource file, schema file, and data file will be validated.  If archiveDir is
         specified, the copies of the files will be validated.  If not, the original files will be validated.
+        Defaults to False.
+    forceInteger : bool
+        Optional. If True, then try harder to cast integer fields.  This may involve rounding the values to the ones places.
+        Defaults to False.
+    forceInt64 : bool
+        Optional. If True, then cast all integer fields as Int64 regardless of whether this is necessary.  This is useful
+        when trying to merge dataframes which would otherwise have mixed int32 and Int64 fields. Defaults to False.
+    verbose : bool
+        Optional.  If False, then most output will be suppressed.  Defaults to True.
     Returns
     -------
@@ -559,7 +573,7 @@ def load_data(resourcePath, archiveDir=None, validate=False, verbose=True):
         print("morpc.load_data | ERROR | Unknown data file extension: {}".format(dataFileExtension))
         raise RuntimeError
-    df = cast_field_types(df, resource.schema, verbose=verbose)
+    df = cast_field_types(df, resource.schema, forceInteger=forceInteger, forceInt64=forceInt64, verbose=verbose)
     return df, resource, resource.schema

morpc 0.3.2__tar.gz → 0.3.4__tar.gz

morpc 0.3.2tar.gz → 0.3.4tar.gz