Code Monkey home page Code Monkey logo

pyriv's People

Contributors

hwright-ucsb avatar jkibele avatar rrcarlson avatar

Stargazers

 avatar

Watchers

 avatar  avatar

Forkers

rrcarlson

pyriv's Issues

Clean up accidental commits

There are a number of files that seem like they probably shouldn't be part of the repo. I've already removed a couple (like .DS_Store), but I think there are more. For example:

  • graphsavetester.py
  • kusko_prunedtry0.graphml
  • .travis.yml
  • .edtorconfig

I generally try not to use git add -all because this always happens to me. In most cases, I like to just add files explicitly.

GraphBuilder redesign

Following up on this issue, I've decided I need to redesign the pyriv.graph_prep module and GraphBuilder object a bit. Basically, GraphBuilder needs to be able to handle the following tasks:

  1. Network Creation
    1. River Network
      1. Data import (shp or graphpickle for now)
      2. Addition of coastline (to identify river mouths)
      3. Identification of rivermouths and inland deadends
      4. Pruning useless bits off the graph
      5. Fixing unintended breaks
      6. Node coordinate rounding
    2. Coast Network
      1. Options for handling lines or polygons (multipart and single part)
      2. Simplification and node count metrics for prediction of processing time
      3. Edge creation (multiprocessor command line scripts?)
    3. Network Join

In order to do this across all likely scenarios, it has to account for the following cases:

  1. When the river mouths connect perfectly to the coastline (simplest)
  2. When the river mouths are close enough to be auto completed to the coastline
  3. When the connection segments must be created and added in.
  4. Any combination of the previous cases.

I think the work-flow needs to looks something like this:

  1. Initialize GraphBuilder object with river shp, coastline (polygon) shp, and river_mouth_tolerance value.
  2. Get feed back on the inputs with potential problems flagged. Essentially river network dead ends must be categorized:
    • River mouths (connected to coastline)
    • Dead ends:
      • Ocean (river dead ends outside land polygon)
      • Auto complete candidates (river dead end on land, within tolerance from coastline)
      • True dead ends (river dead end on land, beyond tolerance from coastline). These can be unintentional river network breaks or candidates for hand drawn connection to the coastline.
  3. Offer functions for:
    • Auto complete
    • Adding in drawn segments
    • Fixing network gaps
  4. Export of completed river graph to graph pickle format.
  5. Creation and export of coastal network
  6. Joining of river and coastal networks allowing for:
    • Ocean nodes
    • Coastal nodes

Parallel and/or Simplify for Land Graph

Without parallel processing, it took 2 hours and 20 minutes to generate a graph for ocean distances around Nunivak Island. Admittedly, my computer went to sleep during part of that, but it's way to slow regardless. Especially when you consider that that one island is a tiny fraction of the nodes that will need to be used.

We need to consider two options for speeding things up:

  1. Use parallel processing for checking whether potential edges cross the land. This is done in a for loop, so it should be dead simple to implement. I just haven't done it before so it'll take me a bit to figure it out.

  2. Simplify the land graph. This would reduce the number of nodes we need to calculate for. However, we could run into problems if the simplification drops nodes that represent river mouths. This might make those river mouths end up on land. ...and will likely make them not line up perfectly with the actual river mouths. The difference would be negligible in terms of distance, but it would introduce problems with finding paths out of rivers. Possible solutions:

    1. Just don't simplify. This is what I'm leaning toward at the moment. It may take many hours to generate a graph, but we should be able to save the graph and re-use it so I'm not that bothered. ...and if we get it running parallel and do the processing on Aurora, it shouldn't take too long.
    2. Simplify a lot, but add the river mouth nodes back in before calculating the edges. The river mouth nodes are a relatively small subset of the full set of coastline points, so this would probably speed things up considerably.

parallelize graph copying (deep)

nx's is too slow for regular usage. whenever you mutate a graph you should copy it and mutate the copy. this is very, very costly.

Land Object

The Land object will be a subclassed geopandas.GeoDataFrame. It needs the following attributes/methods:

  1. shrink
  2. simplify
  3. coords
  4. line_crosses
  5. generate_coastal_graph

Other methods that are part of the current version of the Land object should go into a CoastalGraph object.

Consider dropping egde attributes

We're currently importing all shapefile attributes as edge attributes in the graph. We're only using 'LengthKM' with the NHD data, but I think I'm not going to even use that with the National Map. ...and it's probably better to not assume we'll have that attribute anyway to make things more flexible. I've already got code to calculate distance edge weights.

So, since I think we're not going to use the attributes, it may make sense to get rid of them. I don't see an easy way to dump them out of the graph, so it might be easier to open the shapefile with geopandas, dump everything aside from geometry, and start from there.

So, basically, either: 1) find a way to drop the edge attributes or 2) write out a temporary version of the shapefile with no attributes and import that. I think option 1 is better if there's a way to do it.

Readme.rst change to readme.md

There are references to readme.rst all over the place in the setup files. I think these should be changed to readme.md but I need to check the implications of that change.

Make a `RivDist` report object

RiverGraph.river_distances currently returns a GeoDataframe object with multiple geometry columns. I will create a new RivDist object that sub-classes GeoDataframe. That way I can add some save methods that will write appropriate path or point shapefiles (a shapefile can only have one geometry column). I can also add some plot methods to display the results.

GraphML Read/Write

It looks like reading graphml is a problem. I saved out a land graph and read it back in and found that the nodes came in as string representations of tuples instead of actual tuples. This causes a number of problems. gpickle may be better. Also, the graphml version was over twice as large.

Revisit setup.py

There's some stuff in there that I don't think is correct and/or relevant. ...entry points for instance...

Maybe Change the Name?

As this thing has developed and the coastal distance calculations have been included, it's become less focused on rivers. ...and as I become more aware of how extensively (if confusingly) developed river network navigation already is, I think it might make sense to change the name to reflect the fact that this library is not only focused on river navigation distances.

I'm thinking it should be something that points more directly toward calculating "swimmable distance" or "fish distance" (both phrases that I pull out of nowhere). I was thinking "swimpy", but that's taken. I don't know. Need to give it more thought.

Projections

Basically, distances derived from unprojected coordinate systems (i.e., WGS84, epsg: 4326, lat/lon, etc.) can't be easily converted because the length of a degree of longitude varies with latitude. This will be more apparent the farther you are from the equator. If we were masochists, we could try to check inputs for projection and reproject, but that would be error prone and take us forever.

So, instead, we'll just have to be clear in the documentation that the inputs must be projected with proper distance, rather than angular, units. If we're feeling generous, maybe we can try to do a bit of introspection and throw some exceptions if people try to load data with angular units.

Look at using distance to coastline instead of buffering

Right now, I'm finding National Map coastal dead end nodes by buffering the coastline and checking for intersection with all deadends. It might be faster and more straight-forward to find the distance of all deadends to the coastline and just threshold those values. This could prevent potential topology problems with buffering and seems like it could be faster. I'll have to do some testing.

Overall redesign scheme

I think I need the following modules:

  1. GraphBuilder: Builds RiverGraphs and joins them to coastal graphs. Needs a river shapefile (or potentially a RiverGraph) and a Land (or optionally, a CoastalGraph) object as input.
  2. RiverGraph: Represents a directed network of rivers, built and pruned by GraphBuilder so that all dead ends are river mouths (other dead ends pruned). Has methods that find distance to river mouth for points (or sets of points).
  3. Land: Definitely takes a polygon. Maybe takes a polyline and converts to polygon? Can create a CoastalGraph.
  4. CoastalGraph: Methods for adding points and calculating edges (with buffer) and for finding distance between points.

Clean up and merge CoastDist branch

In my frantic effort to finish the river herring distance calculations for the east coast of the US, I've made a huge mess of the CoastDist branch. I've made some changes to pyriv.coastal and pyriv.river_graph, but mostly I've put stuff into the poorly named pyriv.rg_light module. This was sort of intended to be a stripped down (in that it doesn't really use the coastline the same way) version of river_graph, but it's kind of not that now. It's a mess.

I've got a couple of jupyter notebooks that demonstrate how the code was used for the east coast stuff. I think the goal for resolving this issue should be to (unfortunately) pretty much refactor the code so that there's a more sensible workflow possible. There are two main tasks here that pyriv is trying to accomplish:

  1. Build the network used to calculate distances. This includes the following types of distances: between sites on a single river network (i.e., no river mouth between sites); between sites on different river networks (i.e., from site to river mouth + ocean distance + upriver distance if site is up a different river); between sites on the coast.
  2. Actually calculate the distances using the built network.

Task 1 is going to be potentially complex and annoying. I don't think there's anyway to truly make that user friendly. There's too much variability in the potential input data sets (e.g., hydrographic network shapefiles, coastline features). Task 2, on the other hand, could be made fairly simple and could go into some sort of easy to use GUI (e.g., QGIS processing toolbox or ESRI Python toolbox).

I'm going to see if I can paste my jupyter notebooks for the east coast calculations up here.

Units

We need to think a bit about how we handle unit conversions. ...pretty much just length at this point. I'm currently inclined toward using numericalunits. It's a single module, no dependencies, pip installable, easy to use, and will help us check for unit-conversion f#$k-ups.

However, it will not help us deal with unprojected coordinates. In other words, it can't convert between degrees longitude and kilometers. That's a bit of a separate issue. ...so I'll start a separate issue about projections.

RiverGraph redesign

So, the RiverGraph object currently includes the coastline because it needs a concept of "river mouth" that is distinct from just a dead end. It might be cleaner to, if possible, use GraphBuilder to generate a RiverGraph that is pruned of any extraneous dead ends. Then RiverGraph could treat all deadends as river mouths in distance calculations.

Essentially, I think it would make more sense to have the RiverGraph object only do river related things (like calculate the distances).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.