jkibele / pyriv Goto Github PK

Python library for analysis of minimum aquatic distance across rivers and coasts. It's like Google Maps for anadromous fish.

License: MIT License

Jupyter Notebook 98.66% Python 1.30% Makefile 0.04%

pyriv's People

Contributors

Stargazers

Watchers

Forkers

rrcarlson

pyriv's Issues

Clean up accidental commits

There are a number of files that seem like they probably shouldn't be part of the repo. I've already removed a couple (like .DS_Store), but I think there are more. For example:

graphsavetester.py
kusko_prunedtry0.graphml
.travis.yml
.edtorconfig

I generally try not to use git add -all because this always happens to me. In most cases, I like to just add files explicitly.

GraphBuilder redesign

Following up on this issue, I've decided I need to redesign the pyriv.graph_prep module and GraphBuilder object a bit. Basically, GraphBuilder needs to be able to handle the following tasks:

Network Creation
1. River Network
  1. Data import (shp or graphpickle for now)
  2. Addition of coastline (to identify river mouths)
  3. Identification of rivermouths and inland deadends
  4. Pruning useless bits off the graph
  5. Fixing unintended breaks
  6. Node coordinate rounding
2. Coast Network
  1. Options for handling lines or polygons (multipart and single part)
  2. Simplification and node count metrics for prediction of processing time
  3. Edge creation (multiprocessor command line scripts?)
3. Network Join

In order to do this across all likely scenarios, it has to account for the following cases:

When the river mouths connect perfectly to the coastline (simplest)
When the river mouths are close enough to be auto completed to the coastline
When the connection segments must be created and added in.
Any combination of the previous cases.

I think the work-flow needs to looks something like this:

Initialize GraphBuilder object with river shp, coastline (polygon) shp, and river_mouth_tolerance value.
Get feed back on the inputs with potential problems flagged. Essentially river network dead ends must be categorized:
- River mouths (connected to coastline)
- Dead ends:
  - Ocean (river dead ends outside land polygon)
  - Auto complete candidates (river dead end on land, within tolerance from coastline)
  - True dead ends (river dead end on land, beyond tolerance from coastline). These can be unintentional river network breaks or candidates for hand drawn connection to the coastline.
Offer functions for:
- Auto complete
- Adding in drawn segments
- Fixing network gaps
Export of completed river graph to graph pickle format.
Creation and export of coastal network
Joining of river and coastal networks allowing for:
- Ocean nodes
- Coastal nodes

Parallel and/or Simplify for Land Graph

Without parallel processing, it took 2 hours and 20 minutes to generate a graph for ocean distances around Nunivak Island. Admittedly, my computer went to sleep during part of that, but it's way to slow regardless. Especially when you consider that that one island is a tiny fraction of the nodes that will need to be used.

We need to consider two options for speeding things up:

Use parallel processing for checking whether potential edges cross the land. This is done in a for loop, so it should be dead simple to implement. I just haven't done it before so it'll take me a bit to figure it out.
Simplify the land graph. This would reduce the number of nodes we need to calculate for. However, we could run into problems if the simplification drops nodes that represent river mouths. This might make those river mouths end up on land. ...and will likely make them not line up perfectly with the actual river mouths. The difference would be negligible in terms of distance, but it would introduce problems with finding paths out of rivers. Possible solutions:
1. Just don't simplify. This is what I'm leaning toward at the moment. It may take many hours to generate a graph, but we should be able to save the graph and re-use it so I'm not that bothered. ...and if we get it running parallel and do the processing on Aurora, it shouldn't take too long.
2. Simplify a lot, but add the river mouth nodes back in before calculating the edges. The river mouth nodes are a relatively small subset of the full set of coastline points, so this would probably speed things up considerably.

parallelize graph copying (deep)

nx's is too slow for regular usage. whenever you mutate a graph you should copy it and mutate the copy. this is very, very costly.

Land Object

The Land object will be a subclassed geopandas.GeoDataFrame. It needs the following attributes/methods:

shrink
simplify
coords
line_crosses
generate_coastal_graph

Other methods that are part of the current version of the Land object should go into a CoastalGraph object.

Consider dropping egde attributes

We're currently importing all shapefile attributes as edge attributes in the graph. We're only using 'LengthKM' with the NHD data, but I think I'm not going to even use that with the National Map. ...and it's probably better to not assume we'll have that attribute anyway to make things more flexible. I've already got code to calculate distance edge weights.

So, since I think we're not going to use the attributes, it may make sense to get rid of them. I don't see an easy way to dump them out of the graph, so it might be easier to open the shapefile with geopandas, dump everything aside from geometry, and start from there.

So, basically, either: 1) find a way to drop the edge attributes or 2) write out a temporary version of the shapefile with no attributes and import that. I think option 1 is better if there's a way to do it.

Fixing breaks in Yukon

link canadian sasap regions for our AK dataset

Decide on Package Organization

How do we want to modularize and separate functionalities/objects

documentation and docstrings, PEP8

complete PEP8 documentation and necessary docstring tests for pyriv modules

Add read_shp class method to RiverGraph

The current method of using the data keyword argument seems awkward and I like the way the CoastLine class handles it better.

Deadend detection and visualization

Given a network representation of a shapefile, find all deadend nodes, attribute with coastal or inland, and return geodataframe. Geodataframe can then be used to visualize directly or can be exported to a shapefile. I've got an example on the NCEAS github: https://github.nceas.ucsb.edu/gist/jkibele/979f9d1745ed3bf8086cccea048270ff

Readme.rst change to readme.md

There are references to readme.rst all over the place in the setup files. I think these should be changed to readme.md but I need to check the implications of that change.

Make a `RivDist` report object

RiverGraph.river_distances currently returns a GeoDataframe object with multiple geometry columns. I will create a new RivDist object that sub-classes GeoDataframe. That way I can add some save methods that will write appropriate path or point shapefiles (a shapefile can only have one geometry column). I can also add some plot methods to display the results.

GraphML Read/Write

It looks like reading graphml is a problem. I saved out a land graph and read it back in and found that the nodes came in as string representations of tuples instead of actual tuples. This causes a number of problems. gpickle may be better. Also, the graphml version was over twice as large.

Revisit setup.py

There's some stuff in there that I don't think is correct and/or relevant. ...entry points for instance...

Maybe Change the Name?

As this thing has developed and the coastal distance calculations have been included, it's become less focused on rivers. ...and as I become more aware of how extensively (if confusingly) developed river network navigation already is, I think it might make sense to change the name to reflect the fact that this library is not only focused on river navigation distances.

I'm thinking it should be something that points more directly toward calculating "swimmable distance" or "fish distance" (both phrases that I pull out of nowhere). I was thinking "swimpy", but that's taken. I don't know. Need to give it more thought.

Projections

Basically, distances derived from unprojected coordinate systems (i.e., WGS84, epsg: 4326, lat/lon, etc.) can't be easily converted because the length of a degree of longitude varies with latitude. This will be more apparent the farther you are from the equator. If we were masochists, we could try to check inputs for projection and reproject, but that would be error prone and take us forever.

So, instead, we'll just have to be clear in the documentation that the inputs must be projected with proper distance, rather than angular, units. If we're feeling generous, maybe we can try to do a bit of introspection and throw some exceptions if people try to load data with angular units.

Look at using distance to coastline instead of buffering

Right now, I'm finding National Map coastal dead end nodes by buffering the coastline and checking for intersection with all deadends. It might be faster and more straight-forward to find the distance of all deadends to the coastline and just threshold those values. This could prevent potential topology problems with buffering and seems like it could be faster. I'll have to do some testing.

Overall redesign scheme

I think I need the following modules:

GraphBuilder: Builds RiverGraphs and joins them to coastal graphs. Needs a river shapefile (or potentially a RiverGraph) and a Land (or optionally, a CoastalGraph) object as input.
RiverGraph: Represents a directed network of rivers, built and pruned by GraphBuilder so that all dead ends are river mouths (other dead ends pruned). Has methods that find distance to river mouth for points (or sets of points).
Land: Definitely takes a polygon. Maybe takes a polyline and converts to polygon? Can create a CoastalGraph.
CoastalGraph: Methods for adding points and calculating edges (with buffer) and for finding distance between points.

Clean up and merge CoastDist branch

In my frantic effort to finish the river herring distance calculations for the east coast of the US, I've made a huge mess of the CoastDist branch. I've made some changes to pyriv.coastal and pyriv.river_graph, but mostly I've put stuff into the poorly named pyriv.rg_light module. This was sort of intended to be a stripped down (in that it doesn't really use the coastline the same way) version of river_graph, but it's kind of not that now. It's a mess.

I've got a couple of jupyter notebooks that demonstrate how the code was used for the east coast stuff. I think the goal for resolving this issue should be to (unfortunately) pretty much refactor the code so that there's a more sensible workflow possible. There are two main tasks here that pyriv is trying to accomplish:

Build the network used to calculate distances. This includes the following types of distances: between sites on a single river network (i.e., no river mouth between sites); between sites on different river networks (i.e., from site to river mouth + ocean distance + upriver distance if site is up a different river); between sites on the coast.
Actually calculate the distances using the built network.

Task 1 is going to be potentially complex and annoying. I don't think there's anyway to truly make that user friendly. There's too much variability in the potential input data sets (e.g., hydrographic network shapefiles, coastline features). Task 2, on the other hand, could be made fairly simple and could go into some sort of easy to use GUI (e.g., QGIS processing toolbox or ESRI Python toolbox).

I'm going to see if I can paste my jupyter notebooks for the east coast calculations up here.

no error when read graph from file is unsuccessful

change to notify user

Units

We need to think a bit about how we handle unit conversions. ...pretty much just length at this point. I'm currently inclined toward using numericalunits. It's a single module, no dependencies, pip installable, easy to use, and will help us check for unit-conversion f#$k-ups.

However, it will not help us deal with unprojected coordinates. In other words, it can't convert between degrees longitude and kilometers. That's a bit of a separate issue. ...so I'll start a separate issue about projections.

RiverGraph redesign

So, the RiverGraph object currently includes the coastline because it needs a concept of "river mouth" that is distinct from just a dead end. It might be cleaner to, if possible, use GraphBuilder to generate a RiverGraph that is pruned of any extraneous dead ends. Then RiverGraph could treat all deadends as river mouths in distance calculations.

Essentially, I think it would make more sense to have the RiverGraph object only do river related things (like calculate the distances).