sausy-lab / retro-gtfs Goto Github PK
View Code? Open in Web Editor NEWCollect real-time transit data and process it into a retroactive GTFS 'schedule' which can be used for routing/analysis
Collect real-time transit data and process it into a retroactive GTFS 'schedule' which can be used for routing/analysis
change from random requests to linking update requests more closely to incoming trips
Daylight savings time is not being handled correctly yet.
Things that need to have correct DST handling added/verified:
pull_data.sql
for calendar.txt
pull_data.sql
partially implemented, needs testing to show that matching still works.
clean up code, and make this not be a total hack
Is this the same thing as stop_dist
?
https://github.com/SAUSy-Lab/retro-gtfs/blob/master/sample_conf.py#L43
these are currently just deleting the trips
This is in trip.py
I think this has may have something to do with finding the starting trip_id or something simple like that. Can probably optimize pretty easily.
These trips are currently being discarded
Check whether stops are anywhere near the clean geom - don't project too far away from actual data. See e.g. jv_t60382
I believe the profile I'm using now is not routing on streetcar tracks.
The vehicles table has been eliminated, and the store.py entry point works great. However the vehicles are not being stored yet, only the stop_times and trips. I need to store the vehicle times in an array alongside the trip and leave their locations as incorporated in the orig_geom linestring.
I will also need to update process.py's from_DB method to use these.
Had this problem after running the script for about 2.5 days. Python seems to have eaten all remaining memory on a 3gb machine, crashing. What caused this? Could it be slower DB inserts? Cumulative errors and threads failing to close? trash collection failure? Doesn't seem like it ran out of procs, but RAM. Also noticed ~3doz temp files that had not been deleted. Why would that be?
Reran script without truncating DB and had a crash less than a day later, several times.
Put monitoring in place and run again for several days, with constant output, checking for growth in any variables.
should help with debugging I think
Since trips are broken at terminals, there are instances where terminal stops aren't being caught by the 30m buffer for all stops. Allow terminal stops an e.g. 2x buffer distance.
Why do I have it in here and can I just fix it?
We need a metric for assessing how likely it is that a set of trips has been adequately mapped to a set of stops. Id est, to tell us whether more work needs to be done before the retro-GTFS package will be a decent representation of the actual transit service performed. Some ideas (completed):
Ignore:
in nb_api.py
This may be fine for storing the trips, but we need multiprocessing for trip processing. currently this is using threading.
How could that be? Where is that geometry coming from?
This will allow the traveler to stay on the vehicle... most importantly this means that if the trip 'starts' just outside the station, it will be linked to the station by the previous trip in the block.
Break blocks only when vehicles go off the radar. Start trips with new headsigns or route_ids, but maintain block with the vehicle.
Trips that start before midnight and end after are not working with the current time zone handling.
See e.g. jv trip 71885
For some trips the first stop (which looks like it is being made) seems to be appearing at the end of the sequence.
Line 62 in 68f6be5
When running this script, I get the error in the title. AFAIK, you need an ALTER TABLE
statement to add a new column to a table, so this would make sense - since when stop_times
is created, it doesn't include a fake_stop_id
column.
A quick look at parts of the #94 show that the ossington station stop is not being recorded very well. Surely the whole trip is being made, but the stop is not being recorded for many trips. Why is this? This is probably indicative of broader problems.
Suggestion: when trips are broken, a gap is formed between two vehicle reports. Perhaps just spanning the gap would help make the final connection.
I know that matching cannot be working as the OSRM server is not running with the right data. Yet I get no error. Fix this!
It seems that a few threads are hanging unfinished during the processing phase, preventing other threads from being initiated. If too many trips are queued, the process will not finish and will keep sleeping forevs.
Currently there is no way of handling changes to the schedule. We keep randomly checking the schedule data from the NextBus API to see if anything has changed, and we store any changes. But nothing is done to link the most recent data to an ending trip. This has worked because nothing has changed during script execution yet.
When the set of stops is selected, it needs to be the most recent set of stops only.
Check whether the trip covers a substantial portion of the route and if it does, assume that it goes all the way to the ends and therefore makes all the stops.
Project outwards to the termini at some speed based on observation of surrounding points.
Work with schedule data imported into PostgreSQL using GTFS tables and field names directly.
They look like they have reasonable data...
jv routes
High speed travel detected in trip 68766: 3rd Ave. S. & 3rd St. to Rosa Parks Station Bay K. 26282 meters in 219 seconds. (432 km/h).
High speed travel detected in trip 71330: N. 3rd St. & N. 9th Ave. to Rosa Parks Station Bay B. 25829 meters in 7 seconds. (13283 km/h).
High speed travel detected in trip 59956: McCormick Rd. & Ft. Caroline Rd. to Rosa Parks Station Bay L. 13865 meters in 7 seconds. (7131 km/h).
High speed travel detected in trip 53250: McCormick Rd. & Ft. Caroline Rd. to Rosa Parks Station Bay L. 13865 meters in 7 seconds. (7131 km/h).
High speed travel detected in trip 66628: McCormick Rd. & Ft. Caroline Rd. to Rosa Parks Station Bay L. 13865 meters in 7 seconds. (7131 km/h).
and 2559 more of this type.
That is, it still shows stops that aren't even yet available.
In file create-agency-tables.sql, there's an extra comma on line 52.
vehicle sequencing and trip scrubbing may be being happening twice. Remove vehicle sequencing from storage step completely.
This depends on Project-OSRM/osrm-backend#4785
Data for the TTC works well (20 second freq.) but other agencies at > 30 seconds are giving weird results. ORSM needs fixing and the transit.lua profile needs to be rewritten to favour known transit routes.
Perhaps allow a retry or two and then report an error to the problems
field rather than aborting the script.
ensure that reprocessing of trips deletes any previous entries from that trip in the stop_times table
I think I could speed up processing by doing some of the GIS stuff directly in Python rather than repeatedly making postgresql find the same records again and again. Making this change may be a rather large and cumbersome process, and I don't yet know exactly how it will be done.
Long observation periods can produce datasets too large to hold in memory for OTP (for TTC at least). Need to be able to pull out just one or two days of data into GTFS format.
Only some things that result in the deletion of a trip are currently being flagged in the problem field. All issues should be appended.
Stop moves, e.g. a stop moved one block over with no id change, cause both geometries to be given in stops.txt with the same stop_id
. I need to change the stop_id
in stops
and also in stop_times
. output sql script will need to be updated as well.
rather flag them off but retain them
thus, processing can be retried with possibly different parameters
Having got rid of calendar_dates.txt, you must now find a new way of finding the service_id of a trip. Do it by finding the days from the epoch or something like that, which will always be unique and replicable for the same date.
When a very poor match is returned, try rematching with a wider error radius. This is harder computationally but may return better results for some trips.
This would particularly help a couple routes in San Francisco. It's not known if this will produce worse/weird results in other cities.
Running store.py
with nohup
continuously and the current contents of nohup.out
the log file for stdout is
1068 in fleet, 4 ending trips
1066 in fleet, 6 ending trips
1066 in fleet, 5 ending trips
1065 in fleet, 4 ending trips
1065 in fleet, 11 ending trips
And so on, which isn't particularly useful given the lack of timestamps. Thoughts on adding timestamps or other info to this output?
Do I need to go back and do something about this? At the least I need to store this is thetrip problem field
Should drop off the first one. I'm seeing this right now on JTA trip_id = 69648
E.g. for jv_
trip_id to process --> 60382
default route used for direction 102_0_var0
stop off by 2484.67682975 meters for trip 60382
stop off by 5248.22659567 meters for trip 60382
stop off by 7429.09854012 meters for trip 60382
stop off by 8619.05381025 meters for trip 60382
stop off by 10352.7808203 meters for trip 60382
store.py should essentially just do inserts and leave the rest for processing.py, unless the doMatching flag is present
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.