udst / spandex Goto Github PK

View Code? Open in Web Editor NEW

22.0 22.0 7.0 2.18 MB

Spatial Analysis and Data Extraction

Home Page: http://nbviewer.ipython.org/github/synthicity/user_meeting_2014/blob/gh-pages/spandex/spandex_demo.ipynb

License: BSD 3-Clause "New" or "Revised" License

Python 100.00%

spandex's People

Contributors

Stargazers

Watchers

Forkers

sandag scdavis50 terratenney john-min anhnguyendepocen afcarl ningxue123456 fagan2888

spandex's Issues

Create a lightweight injectables registry so that urbansim is not a dependency

Code currently uses sim.add_injectable / sim.get_injectable to register database conn/cur objects so that conn/cur don't need to be passed around as function parameters.

A love to letter to dependencies: describing an environment and set of commits in which spandex runs

The TableLoader class is sensitive to the version of Anaconda and SQLAlchemy that one is using.

For example, this commit 5356cf9 was necessary to get these scripts to work: https://github.com/synthicity/bayarea_urbansim/blob/master/data_regeneration/run.py#L35-L38

However, that same commit breaks https://github.com/synthicity/bayarea_urbansim/blob/master/data_regeneration/run.py#L35-L38 scripts on Anaconda 2.1.0 on this machine https://github.com/MetropolitanTransportationCommission/bayarea_urbansim_setup/tree/vagrant-ubuntu14-bloomberg

Perhaps this would be a good argument for CI? Or very up-front documentation of current requirements?

Create from point / impute from point functions

Function to add/impute values in the disaggregate data (e.g. buildings, establishments) from spatial points representing observed values, such as from commercial data sources (e.g. InfoUSA, Metrostudy, Costar, Exceligent, REIS, Axiometric, Real Facts, D&B).

Examples of create from point:


create_from_point(buildings, commercial_data_source, how='replace')
create_from_point(buildings, commercial_data_source, how='add')

Examples of impute from point:


impute_from_point(buildings, commercial_data_source, {'unit_price':'price', 'non_residential_sqft':'sqft'})
impute_from_point(buildings, commercial_data_source, criteria='unit_price<50000')
impute_from_point(buildings, commercial_data_source, within=50)
impute_from_point(buildings, commercial_data_source, aggregation=('zone_id','mean'))

test_tableframe test fails

In this environment, which is as close to the travis config in the repo as i could get, https://github.com/MetropolitanTransportationCommission/bayarea_urbansim_setup/tree/vagrant-ubuntu14-giuliani

test_tableframe fails.

vagrant@vagrant-ubuntu-trusty-64:/vm_project_dir/spandex$ py.test --cov "/home/vagrant/miniconda/lib/python2.7/sit
e-packages/spandex" --cov-report term-missing
=============================================== test session starts ===============================================
platform linux2 -- Python 2.7.9 -- py-1.4.27 -- pytest-2.7.1
rootdir: /home/vagrant/miniconda/lib/python2.7/site-packages, inifile:
plugins: cov
collected 74 items

../../home/vagrant/miniconda/lib/python2.7/site-packages ...................................................................Fs.....Coverage.py warning: Module /home/vagrant/miniconda/lib/python2.7/site-packages/spandex was never imported.
Coverage.py warning: No data was collected.


==================================================== FAILURES =====================================================
_________________________________________________ test_tableframe _________________________________________________

loader = <spandex.io.TableLoader object at 0x7fe4f1088390>

    def test_tableframe(loader):
        table = loader.tables.sample.hf_bg
        for cache in [False, True]:
            tf = TableFrame(table, index_col='gid', cache=cache)
>           assert isinstance(tf.index, pd.Index)

spandex/tests/test_io.py:14:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
spandex/io.py:483: in index
    index_col=self._index_col).index
spandex/io.py:741: in db_to_df
    q = db_to_query(query)
spandex/io.py:672: in db_to_query
    return sess.query(*orm)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <sqlalchemy.orm.attributes.InstrumentedAttribute object at 0x7fe4f07ee110>

>   ???
E   NotImplementedError: Class <class 'sqlalchemy.orm.attributes.InstrumentedAttribute'> is not iterable

build/bdist.linux-x86_64/egg/sqlalchemy/sql/operators.py:316: NotImplementedError
--------------------------------- coverage: platform linux2, python 2.7.9-final-0 ---------------------------------
Name    Stmts   Miss  Cover   Missing
-------------------------------------
================================= 1 failed, 72 passed, 1 skipped in 32.30 seconds =================================

Reshape (clip) function

Function to reshape (clip) one geometry by another geometry.

A reshape (or clip) operation is used to adjust parcel boundaries in cases where this is appropriate. The most common application of this operation is when part of a parcel is underwater. We are interested in the land associated with parcels, so the underwater portion of parcels should be clipped away.

The main idea is that st_area(parcel.geom) should yield the land area. An example of where this operation would be applied: land parcels in San Mateo county that sit by the bay or ocean, but extend very far out into the bay or ocean.

Calculate distance function

Calculate distance between one geometry and another.

Examples:


calc_dist(parcels, canyon_edge, how='network')
calc_dist(parcels, transit, how='network')
calc_dist(parcels, water_bodies, how='straight_line')
calc_dist(parcels, highway, how='straight_line')
calc_dist(parcels, air_pollutant_source, how='straight_line')

Slice function

Function to slice geometry along the boundaries of another geometry.

Example SQL (using the small spandex test data to slice parcels along block group boundaries):


with a as(
select 
        heather_farms.parcel_id,
        heather_farms.calc_area,
        heather_farms.shape_area,
        heather_farms.parcel_acr,
        heather_farms.shape_leng,
        st_intersection(heather_farms.geom, hf_bg.geom) as geom
FROM heather_farms, hf_bg
where st_intersects(heather_farms.geom, hf_bg.geom)
), b as(
select *, st_area(geom) as icalc_area from a
)
select
parcel_id,
geom,
icalc_area/calc_area*shape_area as shape_area,
icalc_area/calc_area*parcel_acr as parcel_acr,
icalc_area/calc_area*shape_leng as shape_leng
from b

This function will typically be applied in the context of parcels. Post-slice child parcels will be assigned field values from the parent parcel, with the user specifying fields for which parent value is taken as is and fields for which the parent value is allocated to the children weighted by area. In the example above, parcel_id is taken from the parent parcel as is, and shape_area/parcel_acr/shape_leng are allocated by area.

Options for preventing slivers will be provided. For parcel slices that result in slivers, the parent parcel is left intact. Slivers can be defined by area (e.g. area < 500) and/or by shape (e.g.((perimeter/4.0)/sqrt(area)) > 2).

Parcels will be sliced so that boundaries align with those of key summary geographies and control geographies. Summary geographies are any geography that simulation results will be summarized at (i.e. disaggregate agents will be aggregated to this level). Slicing parcels along key boundaries ensures a clean accounting of land: parcels will nest cleanly within higher level geographies. This is also useful during imputation- when small area control totals are to be met, but parcel boundaries don’t align with the control boundaries, unintended side-effects can result.

The key idea is: geography.aggregate(parcel.land_area) should equal geography.land_area. Examples of situations encountered in the past where parcel slicing is useful: block-level controls are desired but block boundaries are frequently inconsistent with parcel boundaries, zonal boundaries bisect a set of parcels so the parcels are sliced to ensure a clean accounting of land up to the zone level (a key summary geography); zonal boundaries are smaller than parcels in an area with a very large parcel (so that some zones contain zero parcels and are thus undevelopable) and the large parcel is sliced up into smaller parcels corresponding to zone boundaries; a control geography contains a few parcels in their entirety and a number of other parcels where a significant proportion of their land area overlaps with the control geography but the centroid of those parcels does not fall within the control geography- when replicating agents/buildings to match the geography’s control total the resulting agents/buildings are allocated to sliced parcels so they do not get artificially stuffed into the few parcels that are entirely within the boundary.

Slicing and reshaping parcels appropriately has implications for the correct calculation of other spatial operations like proportion_overlap.

Apply regression equation function

Function to impute missing/invalid/outlying values using predictions from a regression model or other statistical model (e.g. poisson). Applies regression equation, which may have been estimated in urbansim or statsmodels.

Examples:


apply_regression(buildings, variable_to_impute, regression_equation, replacement_criteria)
apply_regression(buildings, 'residential_rent', regression_equation)

Condo detection/merging function

Function used to detect condo ownership records that have been stored as tiny parcels or stacked parcels. Ownership records are merged into a single building/parcel record with one geometry.

targets/synthesis .pop() error

In targets/synthesis, when a count column is passed, the adding or removing rows is done in two functions: _add_rows_by_count or _remove_rows_by_count.

Inside these functions, there is a case that is not contemplated. From the df, after applying the corresponding filters, only the rows that have a value in the count column less or equal than the amount (to add or remove) are saved into the sort_count array which is then used to pick to_add or to_remove indexes. BUT, the unforeseen case is when the sort_count array is empty, meaning that all of the available rows in the filtered df have count values bigger than the amount value.
In the case of adding rows, an error is raised in line 213
https://github.com/UDST/spandex/blob/master/spandex/targets/synthesis.py#L213
when trying to pop out an index from to_add when this is empty.

In the case of removing, an error will not raise, but the process will iterate over the empty sort_count and the function will end up returning the same df.

Unit tests

Scale/synthesize to match aggregate totals

A common step in preparing UrbanSim base year data is ensuring that disaggregate data (e.g. buildings, households, jobs) matches aggregate targets at specified geographic levels. For example, in the case of disaggregate building data, we might want to match residential unit counts by block, match median home values by tract, match building year built by zipcode, or match non-residential-sqft totals by zone. To match totals for a given geography, we either synthesize new agents to match the total by sampling/copying/allocating existing agents within the geography, or we select agents within the geography for deletion. When matching an aggregate mean/median/total of some agent attribute in a certain geography, a scale-to-match approach can also be taken.

We want to be able to:

Synthesize new things in the disaggregate data to match an aggregate target value. Disaggregate things are replicated to match the target, and new things are then assigned a location_id within the aggregate geography (i.e. an allocation step).
Scale values in the disaggregate data to match an aggregate target value.

A control_to_target function is envisioned that takes the following arguments: agent_df, controls_df, and optionally allocation_geography_df. There are similarities between what this function would do and what the existing transition model does, as examples below will show. There are also similarities between this function and an UrbanSim refinement model. If we want the code to live in UrbanSim instead of Spandex, that is fine.

The operation of this function can be illustrated by looking at fake control table examples. See below. A couple of points: The agent_type column may not be needed as the agent type is implied by the agent_df argument. The allocate_to column may not be needed because the allocate_to geography is implied by the allocation_geography_df argument.

Example 1

location_type	location_id	agent_type	target	how_match	allocate_to	allocation_weight	allocation_capacity_attribute
zone	1	household	1000	synthesize	building	residential_units	residential_units
zone	2	household	3000	synthesize	building	residential_units	residential_units
zone	3	household	2000	synthesize	building	residential_units	residential_units

In Example 1, we match household totals by zone and allocate to buildings within the zone according to the distribution of residential units, respecting a capacity constraint. If zone 1 contains less than 1,000 households, we randomly sample the needed number of new households from the existing households in zone 1, copy them, then allocate the new households to buildings in the zone (i.e. assign a building_id). If zone 1 contains more than 1,000 households, we randomly sample existing households for deletion. The agents_df argument in this case would be a DataFrame of households with a zone_id column. The controls_df argument in this case would be the table shown above. The allocation_geography_df would be a DataFrame of buildings with a zone_id column. In the allocation step, we would respect the capacity constraint identified in the allocation_capacity_attribute column of the controls_df table (number of households assigned to a building should not exceed the number of residential units in the building).

Example 2

location_type	location_id	agent_type	agent_accounting_attribute	agent_filter	target	how_match	allocate_to	allocation_weight	allocation_capacity_attribute
zone	1	household	persons	income < 40000	5600	synthesize	building	residential_units	residential_units
zone	2	household	persons	income < 40000	5600	synthesize	building	residential_units	residential_units
zone	3	household	persons	income >= 50000	2000	synthesize	building	residential_units	residential_units

In Example 2, we populate both the agent_accounting_attribute and agent_filter column in the control table. This means that the target values now refers to persons, not households, and the households we sample to meet this target must pass the agent_filter. Summing household.persons for households in zone 1 where household income is less than 40,000 should result in 5,600. In other words, there are 5,600 people in zone 1 in households with household income less than 40,000.

Example 3

location_type	location_id	agent_type	target	how_match	allocate_to	allocation_weight	allocation_capacity_attribute
zone	1	job	500	synthesize	building	non_residential_sqft	non_residential_sqft/250
zone	2	job	1200	synthesize	building	non_residential_sqft	non_residential_sqft/250
zone	3	job	700	synthesize	building	non_residential_sqft	non_residential_sqft/250

In Example 3, we want to match zonal job targets (agent_type=='job') and allocate new jobs to building weighted by non_residential_sqft. The allocation_capacity_attribute reflects the assumption that each job spot takes up 250 sq ft, and we don't want to exceed the number of job spots in the buildings being allocated to. After running control_to_target, there will be 500 jobs with zone_id 1.

Example 4

location_type	location_id	agent_type	agent_filter	target	how_match	allocate_to	allocation_weight	allocation_capacity_attribute
zone	1	job	sector_id == 11	150	synthesize	building	non_residential_sqft + 50*residential_units	non_residential_sqft/250
zone	2	job	sector_id == 11	500	synthesize	building	non_residential_sqft + 50*residential_units	non_residential_sqft/250
zone	3	job	sector_id == 32	200	synthesize	building	non_residential_sqft + 50*residential_units	non_residential_sqft/250

In Example 4, we control to job targets again, but now the agent_filter column is populated. In zone_id 1, the target of 150 applies only to jobs in sector 11. There should be 150 jobs in zone_id 1 with sector_id 11. Existing jobs in sector 11 and zone 1 are either copied or deleted to match this target. New jobs get a new, unique job_id.

Example 5

location_type	location_id	agent_type	agent_accounting_attribute	target	how_match	allocate_to	allocation_weight	allocation_capacity_attribute
zone	1	building	residential_units	800	synthesize	parcel	parcel_sqft	parcel_sqft/500
zone	2	building	residential_units	200	synthesize	parcel	parcel_sqft	parcel_sqft/500
zone	3	building	residential_units	350	synthesize	parcel	parcel_sqft	parcel_sqft/500

In Example 5, we want to match residential_unit targets. Summing building.residential units for buildings in zone_id 1 should get us 800. Existing buildings with residential units are sampled, copied, and allocated if the existing zonal residential unit count is too low. Otherwise, residential buildings are sampled for deletion if the existing count is too high. We allocate new synthetic buildings to parcel, weighting the allocation by parcel_sqft and respecting the parcel_sqft/500 capacity_constraint.

Example 6

location_type	location_id	agent_type	agent_accounting_attribute	agent_filter	target	how_match	allocate_to	allocation_weight	allocation_capacity_attribute
zone	1	building	non_residential_sqft	building_type_id == 5	30000	synthesize	parcel	parcel_sqft	parcel_sqft/2
zone	2	building	non_residential_sqft	building_type_id == 5	85000	synthesize	parcel	parcel_sqft	parcel_sqft/2
zone	3	building	non_residential_sqft	building_type_id == 5	72000	synthesize	parcel	parcel_sqft	parcel_sqft/2

In Example 6, we match non_residential_sqft totals by zone for building_type_id 5. Note the agent_filter column.

Example 7

location_type	location_id	agent_type	target	how_match	allocate_to	allocation_weight	allocation_capacity_attribute
parcel	111	job	50	synthesize	building	non_residential_sqft	non_residential_sqft/250
parcel	112	job	120	synthesize	building	non_residential_sqft	non_residential_sqft/250
parcel	113	job	70	synthesize	building	non_residential_sqft	non_residential_sqft/250

In Example 7, notice that the location_type is 'parcel' instead of 'zone'. We are matching parcel-level employment targets: there should be 50 jobs attached to buildings on the parcel with parcel_id 111.

Example 8

location_type	location_id	agent_type	agent_accounting_attribute	target	how_match
tract	7	household	income	70000	scale_to_mean
tract	8	household	income	84000	scale_to_mean
tract	9	household	income	39000	scale_to_mean

In Example 8, we are scaling to match the target instead of synthesizing to match (see the how_match column). Here we scale households by tract to match the observed household mean income by tract. The average household income in tract 7 is 70,000 and we want the disaggregate data to reflect this. Notice that when scaling to match, new agents are not synthesized, so the "allocate_" columns are left blank.

Example 9

location_type	location_id	agent_type	agent_accounting_attribute	agent_filter	target	how_match
tract	7	building	year_built	building_type_id < 3	1995	scale_to_median
tract	8	building	year_built	building_type_id < 3	1978	scale_to_median
tract	9	building	year_built	building_type_id < 3	1925	scale_to_median

In Example 9, we scale building year_built to match the observed tract median year built. In tract 7, we want the median year_built of buildings with building_type_id less than 3 to be '1995'. Note 'scale_to_median' in the how_match column.

Example 10

location_type	location_id	agent_type	agent_accounting_attribute	target	how_match
zone	1	building	non_residential_sqft	100000	scale_to_sum
zone	2	building	non_residential_sqft	40000	scale_to_sum
zone	3	building	non_residential_sqft	75000	scale_to_sum

In Example 10, we scale building non_residential_sqft to match a zonal target for non_residential_sqft. We want there to be 100,000 square feet of non-residential space in zone 1, and we want to match this target by scaling existing building records with non-residential sqft instead of synthesizing new building records. We scale existing values downwards or upwards depending on whether the zonal target is currently exceeded or short. Note 'scale_to_sum' in the how_match column.

Support custom projections

Discussed with Eddie. Agreed to implement after working on higher priorities.

---------- Forwarded message ----------
From: Dara Adib
Date: Mon, Sep 29, 2014 at 4:50 PM
Subject: Re: North Carolina sample shp
To: Conor Henley
Cc: Eddie Janowicz, Fletcher Foti

[...]

We want to figure out how to deal with non-standard ESRI projections (SRIDs). Current versions of PostGIS only ship with EPSG codes in the spatial_ref_sys table. The prj2epsg API that we use to find SRIDs (if GDAL fails to find an SRID, which it usually does for anything other than WGS84) only matches to EPSG codes.

Here are some non-standard ESRI projections:
http://spatialreference.org/ref/esri/
http://svn.osgeo.org/gdal/trunk/gdal/data/esri_extra.wkt

There are some ways to populate spatial_ref_sys with these extra projections:
http://community.actian.com/wiki/Spatial_ref_sys
https://gis.stackexchange.com/questions/95831/how-can-i-get-proj4text-from-srtext
http://suite.opengeo.org/opengeo-docs/dataadmin/pgBasics/projections.html

spatial_ref_sys defines projections with two columns, srtext (well-known text) and proj4text (PROJ.4), which is used for reprojections. The prj file shipped with the shapefile includes srtext, but not proj4text.

OGR/GDAL can apparently determine the proj4text from the srtext in the prj file.

$ gdalsrsinfo pittzone.prj

PROJ.4 : '+proj=lcc +lat_1=34.33333333333334 +lat_2=36.16666666666666
+lat_0=33.75 +lon_0=-79 +x_0=609601.2199999997 +y_0=0 +datum=NAD83
+units=us-ft +no_defs '

OGC WKT :
PROJCS["NAD_1983_StatePlane_North_Carolina_FIPS_3200_Feet",
    GEOGCS["GCS_North_American_1983",
        DATUM["North_American_Datum_1983",
            SPHEROID["GRS_1980",6378137.0,298.257222101]],
        PRIMEM["Greenwich",0.0],
        UNIT["Degree",0.0174532925199433]],
    PROJECTION["Lambert_Conformal_Conic_2SP"],
    PARAMETER["False_Easting",2000000.002616666],
    PARAMETER["False_Northing",0.0],
    PARAMETER["Central_Meridian",-79.0],
    PARAMETER["Standard_Parallel_1",34.33333333333334],
    PARAMETER["Standard_Parallel_2",36.16666666666666],
    PARAMETER["Latitude_Of_Origin",33.75],
    UNIT["Foot_US",0.3048006096012192]]

I think QGIS does something similar when it defines custom projections for shapefiles that don't use standard ones:
http://docs.qgis.org/2.2/en/docs/user_manual/working_with_projections/working_with_projections.html

Perhaps when loading shapefiles with unrecognized SRIDs, we should define a custom projection, so that we can load and reproject. Thoughts?

Alternatively, we can preload some ESRI projections, but I don't like this solution because it doesn't cover all possible custom projections and requires specifying the SRID when loading the shapefile (since prj2epsg doesn't detect it).

Add more options to the tag function's 'how' argument

Currently 'point in poly' is supported. Add 'nearest'.

Also add support for tagging 0/1 and populating a dummy variable in the target table.

Spandex/Synthesis container size when count not none

In spandex/targets/synthesis, when a count column is passed, the target value is matched by the sum of the count column instead of counting the number of rows. In the allocation process, the size of the containers (container_size array) is calculated by subtracting the corresponding values in the capacity column or the capacity expression to the number of rows in each geo_id_col element. However, the behavior the user expects when passing a count column is that the container size is also calculated by summing the values in the count column that belong to each geo_id_col element.

Diagnostic function for detecting unfilled holes in a geometry

Land in the hole is unrepresented in the parcel dataset

Set/assert value function

Function used to make and track one-off fixes/assertions/look-up-based-corrections to the data.

Examples:


assert(buildings, 'sqft_per_unit','>250')
assert(buildings, 'non_residential_sqft',0,'building_type_id=1')
assert(buildings, 'non_residential_sqft','footprint_area*stories','building_type_id>2')
assert(parcels, 'land_use_type_id',10,'parcel_id==2314')

Diagnostic function for detecting overlapping geometry (not duplicate stacked)

Synthesize to match function

Pandas-based function to synthesize new things in the disaggregate data to match an aggregate target value. Disaggregate things are replicated to match the target, and new things are then assigned a location_id within the aggregate geography (i.e. an allocation step).

Examples:


synthesize_to_match(buildings, 'residential_units','block_id', census_block_counts)
synthesize_to_match(buildings, 'residential_units','tract_id', acs_sf_unit_estimate)
synthesize_to_match(buildings, 'non_residential_sqft','zone_id',sqft_implied_from_employment)
synthesize_to_match(buildings, 'residential_units','tract_id', mf_units, allocation_criteria)

clarify the role of conventions so that users know whether spandex requires them to use exec_sql to edit databases in all cases

Its unclear to me as a user whether I can edit a database that Spandex is using/managing outside of the Spandex ORM without breaking the Spandex database class' understanding of the schema.

In short, the Spandex database class does not seem to show any existing tables on my public schema after dropping and then re-adding a table in psql and then inspecting the database with Spandex database class.

The long version:
This is a pretty simple and use-case-specific SQL query that takes a few minutes:

https://github.com/synthicity/spandex/blob/master/spandex/spatialtoolz.py#L495-L513

It works fine when you run this script from start to finish:

https://github.com/synthicity/bayarea_urbansim/blob/master/data_regeneration/run.py

Unfortunately, running that script from start to finish takes more than 12 hours on a well-provisioned (and tuned) machine.

As a user, it would be nice to be able to just call that specific SQL query on an arbitrary table. However, it seems that there may be some conventions or dependencies that I am not following in calling it.

In particular, I suspect that I am getting an error because I am not calling that function on a table that was specifically created or registered with one of the several ORM's (2 if you count Spandex as an ORM?) that seem to be in use in this repository.

The error is below. As a user, this means I will probably re-write the query from the ORM language into SQL in order to accomplish my larger goal of reducing the run-time of data regenerations.

---geom_aggregation1 took 7.37857508659 seconds ---
/home/vagrant/anaconda/lib/python2.7/site-packages/sqlalchemy/dialects/postgresql/base.py:2079: SAWarning: Did not recognize type 'bpchar' of column 'county_id'
  name, format_type, default, notnull, domains, enums, schema)
/home/vagrant/anaconda/lib/python2.7/site-packages/sqlalchemy/dialects/postgresql/base.py:2079: SAWarning: Did not recognize type 'unknown' of column 'imputation_flag'
  name, format_type, default, notnull, domains, enums, schema)
PARCEL AGGREGATION:  Merge geometries (and aggregate attributes) based on within-interior-ring status
Traceback (most recent call last):
  File "geom_aggregation2.py", line 16, in <module>
    df = geom_unfilled(t.public.parcels, 'unfilled')
AttributeError: type object 'public' has no attribute 'parcels'
Traceback (most recent call last):
  File "geom_test.py", line 30, in <module>
    check_run('geom_aggregation2.py')
  File "geom_test.py", line 22, in check_run
    return subprocess.check_call([python, path])
  File "/home/vagrant/anaconda/lib/python2.7/subprocess.py", line 540, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['/home/vagrant/anaconda/bin/python', 'geom_aggregation2.py']' returned non-zero exit status 1

parcel table output includes invalid geometries

this makes PostGIS functions like st_area unhappy. to get around this, we've done this:

https://github.com/MetropolitanTransportationCommission/land-use-zoning-checks/blob/master/proportions_of_parcels_with_multiple_overlapping_zones.sql#L23-L46

Additional raster functions

Populate parcel fields with values from raster datasets.

Examples:


aggregate(parcels, slope, how='mean')
aggregate(parcels, altitude, how='centroid')

Extract building data from parcels function

Function that extracts building data from parcel data, assuming the case when building information is embedded in the parcel data.

Example:


buildings = extract_buildings(parcels)

Scale to match function

Pandas-based function to scale values in the disaggregate data to match an aggregate target value.

Examples:


scale_to_match(buildings, 'unit_price', 'tract_id', median_home_values)
scale_to_match(buildings, 'year_built', 'bg_id', acs_mean_year_built)
scale_to_match(buildings, 'non_residential_sqft','zone_id', sqft_implied_from_employment, type='sum')

Clear attributes function

Function used to clear attributes/agents from parcels. Applied when land is to be treated off-model or when land is known to be vacant.

Examples:


clear_attributes(parcels, vacant)
clear_attributes(parcels, gov_land)
clear_attributes(parcels, tribal_land)

udst / spandex Goto Github PK

spandex's People

Contributors

Stargazers

Watchers

Forkers

spandex's Issues

Recommend Projects

Recommend Topics

Recommend Org