The retro-gtfs from sausy-lab

The vehicles table has been eliminated, and the store.py entry point works great. However the vehicles are not being stored yet, only the stop_times and trips. I need to store the vehicle times in an array alongside the trip and leave their locations as incorporated in the orig_geom linestring.

I will also need to update process.py's from_DB method to use these.

Script eating up memory gradually, crashing

Had this problem after running the script for about 2.5 days. Python seems to have eaten all remaining memory on a 3gb machine, crashing. What caused this? Could it be slower DB inserts? Cumulative errors and threads failing to close? trash collection failure? Doesn't seem like it ran out of procs, but RAM. Also noticed ~3doz temp files that had not been deleted. Why would that be?

Reran script without truncating DB and had a crash less than a day later, several times.

Put monitoring in place and run again for several days, with constant output, checking for growth in any variables.

num threads
size of 'fleet'
???

Store cleaned trip geometry in trips table

should help with debugging I think

Allow farther terminal stop matching for OSRM route geometries

Since trips are broken at terminals, there are instances where terminal stops aren't being caught by the 30m buffer for all stops. Allow terminal stops an e.g. 2x buffer distance.

What is this "stop thing failed" thing?

Why do I have it in here and can I just fix it?

Design a quality metric for trips and stop times tables

We need a metric for assessing how likely it is that a set of trips has been adequately mapped to a set of stops. Id est, to tell us whether more work needs to be done before the retro-GTFS package will be a decent representation of the actual transit service performed. Some ideas (~~completed~~):

~~Variance in number of stops made by trips on a route~~
Average number of repeated stops (generally shouldn't happen)
~~Percent of trips with no stops or very few stops~~
Ratio of trips using default versus OSRM geometries
~~Mean confidence rating for OSRM results~~
~~A list of problematic trip_id's~~
...

Ignore:

Trips that are extremely short
Trips that don't go near their appointed stops

chnage from threading -> multiprocessing

in nb_api.py

nb_api.py still uses threading rather than multiprocessing

This may be fine for storing the trips, but we need multiprocessing for trip processing. currently this is using threading.

clean geom seems like it is from different trip

How could that be? Where is that geometry coming from?

add block_id to consecutive trips

This will allow the traveler to stay on the vehicle... most importantly this means that if the trip 'starts' just outside the station, it will be linked to the station by the previous trip in the block.

Break blocks only when vehicles go off the radar. Start trips with new headsigns or route_ids, but maintain block with the vehicle.

pull_data.sql not working past midnight

Trips that start before midnight and end after are not working with the current time zone handling.
See e.g. jv trip 71885

First stop ending up as last for some trips

For some trips the first stop (which looks like it is being made) seems to be appearing at the end of the sequence.

Column `fake_stop_id` of relation `stop_times` does not exist

retro-gtfs/etc/pull_data.sql

Line 62 in 68f6be5

UPDATE :stop_times_table AS st SET fake_stop_id = fake_id

When running this script, I get the error in the title. AFAIK, you need an ALTER TABLE statement to add a new column to a table, so this would make sense - since when stop_times is created, it doesn't include a fake_stop_id column.

Some terminal stops not being recorded

A quick look at parts of the #94 show that the ossington station stop is not being recorded very well. Surely the whole trip is being made, but the stop is not being recorded for many trips. Why is this? This is probably indicative of broader problems.

Suggestion: when trips are broken, a gap is formed between two vehicle reports. Perhaps just spanning the gap would help make the final connection.

matching fails silently

I know that matching cannot be working as the OSRM server is not running with the right data. Yet I get no error. Fix this!

missing threads issue

It seems that a few threads are hanging unfinished during the processing phase, preventing other threads from being initiated. If too many trips are queued, the process will not finish and will keep sleeping forevs.

Handle changing schedule data; match trips to appropriate schedule

Currently there is no way of handling changes to the schedule. We keep randomly checking the schedule data from the NextBus API to see if anything has changed, and we store any changes. But nothing is done to link the most recent data to an ending trip. This has worked because nothing has changed during script execution yet.

When the set of stops is selected, it needs to be the most recent set of stops only.

Project vehicles forward/backward to termini

Check whether the trip covers a substantial portion of the route and if it does, assume that it goes all the way to the ends and therefore makes all the stops.

Project outwards to the termini at some speed based on observation of surrounding points.

lua profiles need to be updated for osrm-backend v5.15 and above

See Project-OSRM/osrm-backend#4825 (comment)

allow for use of GTFS data directly, rather than NB schedule data

Work with schedule data imported into PostgreSQL using GTFS tables and field names directly.

Very high speeds being reported on some routes

They look like they have reasonable data...
jv routes

High speed travel detected in trip 68766: 3rd Ave. S. & 3rd St. to Rosa Parks Station Bay K. 26282 meters in 219 seconds. (432 km/h).

High speed travel detected in trip 71330: N. 3rd St. & N. 9th Ave. to Rosa Parks Station Bay B. 25829 meters in 7 seconds. (13283 km/h).

High speed travel detected in trip 59956: McCormick Rd. & Ft. Caroline Rd. to Rosa Parks Station Bay L. 13865 meters in 7 seconds. (7131 km/h).

High speed travel detected in trip 53250: McCormick Rd. & Ft. Caroline Rd. to Rosa Parks Station Bay L. 13865 meters in 7 seconds. (7131 km/h).

High speed travel detected in trip 66628: McCormick Rd. & Ft. Caroline Rd. to Rosa Parks Station Bay L. 13865 meters in 7 seconds. (7131 km/h).

and 2559 more of this type.

directions_stop_view also shows stops in the future for a trip

That is, it still shows stops that aren't even yet available.

Syntax Error

In file create-agency-tables.sql, there's an extra comma on line 52.

remove redundant trip and vehicle operations

vehicle sequencing and trip scrubbing may be being happening twice. Remove vehicle sequencing from storage step completely.

OSRM not producing good matches at low sampling frequencies

This depends on Project-OSRM/osrm-backend#4785

Data for the TTC works well (20 second freq.) but other agencies at > 30 seconds are giving weird results. ORSM needs fixing and the transit.lua profile needs to be rewritten to favour known transit routes.

Better handling needed for connection errors between script and OSRM

Perhaps allow a retry or two and then report an error to the problems field rather than aborting the script.

repeated entries in nb_stop_times

ensure that reprocessing of trips deletes any previous entries from that trip in the stop_times table

trip processing is very slow with more than ~50M vehicle records

I think I could speed up processing by doing some of the GIS stuff directly in Python rather than repeatedly making postgresql find the same records again and again. Making this change may be a rather large and cumbersome process, and I don't yet know exactly how it will be done.

Add service_id option to pull-data.sql

Long observation periods can produce datasets too large to hold in memory for OTP (for TTC at least). Need to be able to pull out just one or two days of data into GTFS format.

Any ignored trip should have a reason attached

Only some things that result in the deletion of a trip are currently being flagged in the problem field. All issues should be appended.

Moved stops can cause repeated stop_id's in output

Stop moves, e.g. a stop moved one block over with no id change, cause both geometries to be given in stops.txt with the same stop_id. I need to change the stop_id in stops and also in stop_times. output sql script will need to be updated as well.

kill requirement that "doMatching" be present for processing to work

Stop deleting points in the processing phase

rather flag them off but retain them

thus, processing can be retried with possibly different parameters

assign unique service_id from date rather than calendar dates

Having got rid of calendar_dates.txt, you must now find a new way of finding the service_id of a trip. Do it by finding the days from the epoch or something like that, which will always be unique and replicable for the same date.

Expand map matching search radius for very poor matches

When a very poor match is returned, try rematching with a wider error radius. This is harder computationally but may return better results for some trips.

This would particularly help a couple routes in San Francisco. It's not known if this will produce worse/weird results in other cities.

Stdout doesn't include timestamps

Running store.py with nohup continuously and the current contents of nohup.out the log file for stdout is

1068 in fleet, 4 ending trips
1066 in fleet, 6 ending trips
1066 in fleet, 5 ending trips
1065 in fleet, 4 ending trips
1065 in fleet, 11 ending trips

And so on, which isn't particularly useful given the lack of timestamps. Thoughts on adding timestamps or other info to this output?

trip_id to process --> 60382
	default route used for direction 102_0_var0
		stop off by 2484.67682975 meters for trip 60382
		stop off by 5248.22659567 meters for trip 60382
		stop off by 7429.09854012 meters for trip 60382
		stop off by 8619.05381025 meters for trip 60382
		stop off by 10352.7808203 meters for trip 60382

Move trip-cleaning into processing

store.py should essentially just do inserts and leave the rest for processing.py, unless the doMatching flag is present

sausy-lab / retro-gtfs Goto Github PK

retro-gtfs's People

Contributors

Stargazers

Watchers

Forkers

retro-gtfs's Issues

Recommend Projects

Recommend Topics

Recommend Org