Code Monkey home page Code Monkey logo

retro-gtfs's People

Contributors

nate-wessel avatar xtremecurling avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

retro-gtfs's Issues

Handle DST and named timezones

Daylight savings time is not being handled correctly yet.

Things that need to have correct DST handling added/verified:

  • service_ids set on trips
  • dates generated from service_ids in pull_data.sql for calendar.txt
  • stop times generated from service_ids in pull_data.sql
  • ...

Store times in trips table, update DB

The vehicles table has been eliminated, and the store.py entry point works great. However the vehicles are not being stored yet, only the stop_times and trips. I need to store the vehicle times in an array alongside the trip and leave their locations as incorporated in the orig_geom linestring.

I will also need to update process.py's from_DB method to use these.

Script eating up memory gradually, crashing

Had this problem after running the script for about 2.5 days. Python seems to have eaten all remaining memory on a 3gb machine, crashing. What caused this? Could it be slower DB inserts? Cumulative errors and threads failing to close? trash collection failure? Doesn't seem like it ran out of procs, but RAM. Also noticed ~3doz temp files that had not been deleted. Why would that be?

Reran script without truncating DB and had a crash less than a day later, several times.

Put monitoring in place and run again for several days, with constant output, checking for growth in any variables.

  1. num threads
  2. size of 'fleet'
  3. ???

Design a quality metric for trips and stop times tables

We need a metric for assessing how likely it is that a set of trips has been adequately mapped to a set of stops. Id est, to tell us whether more work needs to be done before the retro-GTFS package will be a decent representation of the actual transit service performed. Some ideas (completed):

  • Variance in number of stops made by trips on a route
  • Average number of repeated stops (generally shouldn't happen)
  • Percent of trips with no stops or very few stops
  • Ratio of trips using default versus OSRM geometries
  • Mean confidence rating for OSRM results
  • A list of problematic trip_id's
  • ...

Ignore:

  • Trips that are extremely short
  • Trips that don't go near their appointed stops

add block_id to consecutive trips

This will allow the traveler to stay on the vehicle... most importantly this means that if the trip 'starts' just outside the station, it will be linked to the station by the previous trip in the block.

Break blocks only when vehicles go off the radar. Start trips with new headsigns or route_ids, but maintain block with the vehicle.

Some terminal stops not being recorded

A quick look at parts of the #94 show that the ossington station stop is not being recorded very well. Surely the whole trip is being made, but the stop is not being recorded for many trips. Why is this? This is probably indicative of broader problems.

Suggestion: when trips are broken, a gap is formed between two vehicle reports. Perhaps just spanning the gap would help make the final connection.

matching fails silently

I know that matching cannot be working as the OSRM server is not running with the right data. Yet I get no error. Fix this!

missing threads issue

It seems that a few threads are hanging unfinished during the processing phase, preventing other threads from being initiated. If too many trips are queued, the process will not finish and will keep sleeping forevs.

Handle changing schedule data; match trips to appropriate schedule

Currently there is no way of handling changes to the schedule. We keep randomly checking the schedule data from the NextBus API to see if anything has changed, and we store any changes. But nothing is done to link the most recent data to an ending trip. This has worked because nothing has changed during script execution yet.

When the set of stops is selected, it needs to be the most recent set of stops only.

Project vehicles forward/backward to termini

Check whether the trip covers a substantial portion of the route and if it does, assume that it goes all the way to the ends and therefore makes all the stops.

Project outwards to the termini at some speed based on observation of surrounding points.

Very high speeds being reported on some routes

They look like they have reasonable data...
jv routes

High speed travel detected in trip 68766: 3rd Ave. S. & 3rd St. to Rosa Parks Station Bay K. 26282 meters in 219 seconds. (432 km/h).

High speed travel detected in trip 71330: N. 3rd St. & N. 9th Ave. to Rosa Parks Station Bay B. 25829 meters in 7 seconds. (13283 km/h).

High speed travel detected in trip 59956: McCormick Rd. & Ft. Caroline Rd. to Rosa Parks Station Bay L. 13865 meters in 7 seconds. (7131 km/h).

High speed travel detected in trip 53250: McCormick Rd. & Ft. Caroline Rd. to Rosa Parks Station Bay L. 13865 meters in 7 seconds. (7131 km/h).

High speed travel detected in trip 66628: McCormick Rd. & Ft. Caroline Rd. to Rosa Parks Station Bay L. 13865 meters in 7 seconds. (7131 km/h).

and 2559 more of this type.

Syntax Error

In file create-agency-tables.sql, there's an extra comma on line 52.

trip processing is very slow with more than ~50M vehicle records

I think I could speed up processing by doing some of the GIS stuff directly in Python rather than repeatedly making postgresql find the same records again and again. Making this change may be a rather large and cumbersome process, and I don't yet know exactly how it will be done.

Add service_id option to pull-data.sql

Long observation periods can produce datasets too large to hold in memory for OTP (for TTC at least). Need to be able to pull out just one or two days of data into GTFS format.

Moved stops can cause repeated stop_id's in output

Stop moves, e.g. a stop moved one block over with no id change, cause both geometries to be given in stops.txt with the same stop_id. I need to change the stop_id in stops and also in stop_times. output sql script will need to be updated as well.

Expand map matching search radius for very poor matches

When a very poor match is returned, try rematching with a wider error radius. This is harder computationally but may return better results for some trips.

This would particularly help a couple routes in San Francisco. It's not known if this will produce worse/weird results in other cities.

Stdout doesn't include timestamps

Running store.py with nohup continuously and the current contents of nohup.out the log file for stdout is

1068 in fleet, 4 ending trips
1066 in fleet, 6 ending trips
1066 in fleet, 5 ending trips
1065 in fleet, 4 ending trips
1065 in fleet, 11 ending trips

And so on, which isn't particularly useful given the lack of timestamps. Thoughts on adding timestamps or other info to this output?

Default geom yeilding noisy errors if route incomplete

E.g. for jv_

trip_id to process --> 60382
	default route used for direction 102_0_var0
		stop off by 2484.67682975 meters for trip 60382
		stop off by 5248.22659567 meters for trip 60382
		stop off by 7429.09854012 meters for trip 60382
		stop off by 8619.05381025 meters for trip 60382
		stop off by 10352.7808203 meters for trip 60382

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.