jkibele / pyriv Goto Github PK
View Code? Open in Web Editor NEWPython library for analysis of minimum aquatic distance across rivers and coasts. It's like Google Maps for anadromous fish.
License: MIT License
Python library for analysis of minimum aquatic distance across rivers and coasts. It's like Google Maps for anadromous fish.
License: MIT License
There are a number of files that seem like they probably shouldn't be part of the repo. I've already removed a couple (like .DS_Store
), but I think there are more. For example:
I generally try not to use git add -all
because this always happens to me. In most cases, I like to just add files explicitly.
Following up on this issue, I've decided I need to redesign the pyriv.graph_prep
module and GraphBuilder
object a bit. Basically, GraphBuilder
needs to be able to handle the following tasks:
In order to do this across all likely scenarios, it has to account for the following cases:
I think the work-flow needs to looks something like this:
GraphBuilder
object with river shp, coastline (polygon) shp, and river_mouth_tolerance value.develop unit tests for pyriv
Without parallel processing, it took 2 hours and 20 minutes to generate a graph for ocean distances around Nunivak Island. Admittedly, my computer went to sleep during part of that, but it's way to slow regardless. Especially when you consider that that one island is a tiny fraction of the nodes that will need to be used.
We need to consider two options for speeding things up:
Use parallel processing for checking whether potential edges cross the land. This is done in a for loop, so it should be dead simple to implement. I just haven't done it before so it'll take me a bit to figure it out.
Simplify the land graph. This would reduce the number of nodes we need to calculate for. However, we could run into problems if the simplification drops nodes that represent river mouths. This might make those river mouths end up on land. ...and will likely make them not line up perfectly with the actual river mouths. The difference would be negligible in terms of distance, but it would introduce problems with finding paths out of rivers. Possible solutions:
nx's is too slow for regular usage. whenever you mutate a graph you should copy it and mutate the copy. this is very, very costly.
The Land
object will be a subclassed geopandas.GeoDataFrame
. It needs the following attributes/methods:
Other methods that are part of the current version of the Land
object should go into a CoastalGraph
object.
We're currently importing all shapefile attributes as edge attributes in the graph. We're only using 'LengthKM' with the NHD data, but I think I'm not going to even use that with the National Map. ...and it's probably better to not assume we'll have that attribute anyway to make things more flexible. I've already got code to calculate distance edge weights.
So, since I think we're not going to use the attributes, it may make sense to get rid of them. I don't see an easy way to dump them out of the graph, so it might be easier to open the shapefile with geopandas, dump everything aside from geometry, and start from there.
So, basically, either: 1) find a way to drop the edge attributes or 2) write out a temporary version of the shapefile with no attributes and import that. I think option 1 is better if there's a way to do it.
link canadian sasap regions for our AK dataset
How do we want to modularize and separate functionalities/objects
complete PEP8 documentation and necessary docstring tests for pyriv modules
The current method of using the data
keyword argument seems awkward and I like the way the CoastLine
class handles it better.
Given a network representation of a shapefile, find all deadend nodes, attribute with coastal or inland, and return geodataframe. Geodataframe can then be used to visualize directly or can be exported to a shapefile. I've got an example on the NCEAS github: https://github.nceas.ucsb.edu/gist/jkibele/979f9d1745ed3bf8086cccea048270ff
There are references to readme.rst all over the place in the setup files. I think these should be changed to readme.md but I need to check the implications of that change.
RiverGraph.river_distances
currently returns a GeoDataframe
object with multiple geometry columns. I will create a new RivDist
object that sub-classes GeoDataframe
. That way I can add some save methods that will write appropriate path or point shapefiles (a shapefile can only have one geometry column). I can also add some plot methods to display the results.
It looks like reading graphml is a problem. I saved out a land graph and read it back in and found that the nodes came in as string representations of tuples instead of actual tuples. This causes a number of problems. gpickle may be better. Also, the graphml version was over twice as large.
There's some stuff in there that I don't think is correct and/or relevant. ...entry points for instance...
As this thing has developed and the coastal distance calculations have been included, it's become less focused on rivers. ...and as I become more aware of how extensively (if confusingly) developed river network navigation already is, I think it might make sense to change the name to reflect the fact that this library is not only focused on river navigation distances.
I'm thinking it should be something that points more directly toward calculating "swimmable distance" or "fish distance" (both phrases that I pull out of nowhere). I was thinking "swimpy", but that's taken. I don't know. Need to give it more thought.
Basically, distances derived from unprojected coordinate systems (i.e., WGS84, epsg: 4326, lat/lon, etc.) can't be easily converted because the length of a degree of longitude varies with latitude. This will be more apparent the farther you are from the equator. If we were masochists, we could try to check inputs for projection and reproject, but that would be error prone and take us forever.
So, instead, we'll just have to be clear in the documentation that the inputs must be projected with proper distance, rather than angular, units. If we're feeling generous, maybe we can try to do a bit of introspection and throw some exceptions if people try to load data with angular units.
Right now, I'm finding National Map coastal dead end nodes by buffering the coastline and checking for intersection with all deadends. It might be faster and more straight-forward to find the distance of all deadends to the coastline and just threshold those values. This could prevent potential topology problems with buffering and seems like it could be faster. I'll have to do some testing.
I think I need the following modules:
GraphBuilder
: Builds RiverGraph
s and joins them to coastal graphs. Needs a river shapefile (or potentially a RiverGraph
) and a Land
(or optionally, a CoastalGraph
) object as input.RiverGraph
: Represents a directed network of rivers, built and pruned by GraphBuilder
so that all dead ends are river mouths (other dead ends pruned). Has methods that find distance to river mouth for points (or sets of points).Land
: Definitely takes a polygon. Maybe takes a polyline and converts to polygon? Can create a CoastalGraph
.CoastalGraph
: Methods for adding points and calculating edges (with buffer) and for finding distance between points.In my frantic effort to finish the river herring distance calculations for the east coast of the US, I've made a huge mess of the CoastDist branch. I've made some changes to pyriv.coastal
and pyriv.river_graph
, but mostly I've put stuff into the poorly named pyriv.rg_light
module. This was sort of intended to be a stripped down (in that it doesn't really use the coastline the same way) version of river_graph
, but it's kind of not that now. It's a mess.
I've got a couple of jupyter notebooks that demonstrate how the code was used for the east coast stuff. I think the goal for resolving this issue should be to (unfortunately) pretty much refactor the code so that there's a more sensible workflow possible. There are two main tasks here that pyriv is trying to accomplish:
Task 1 is going to be potentially complex and annoying. I don't think there's anyway to truly make that user friendly. There's too much variability in the potential input data sets (e.g., hydrographic network shapefiles, coastline features). Task 2, on the other hand, could be made fairly simple and could go into some sort of easy to use GUI (e.g., QGIS processing toolbox or ESRI Python toolbox).
I'm going to see if I can paste my jupyter notebooks for the east coast calculations up here.
change to notify user
We need to think a bit about how we handle unit conversions. ...pretty much just length at this point. I'm currently inclined toward using numericalunits. It's a single module, no dependencies, pip installable, easy to use, and will help us check for unit-conversion f#$k-ups.
However, it will not help us deal with unprojected coordinates. In other words, it can't convert between degrees longitude and kilometers. That's a bit of a separate issue. ...so I'll start a separate issue about projections.
So, the RiverGraph
object currently includes the coastline because it needs a concept of "river mouth" that is distinct from just a dead end. It might be cleaner to, if possible, use GraphBuilder
to generate a RiverGraph
that is pruned of any extraneous dead ends. Then RiverGraph
could treat all deadends as river mouths in distance calculations.
Essentially, I think it would make more sense to have the RiverGraph
object only do river related things (like calculate the distances).
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.