wri-cities / static-gtfs-manager Goto Github PK

View Code? Open in Web Editor NEW

144.0 12.0 43.0 98.33 MB

GUI interface for creating, editing, exporting of static GTFS data for a public transit authority

License: GNU General Public License v3.0

Python 11.78% HTML 20.30% JavaScript 63.69% CSS 4.17% Dockerfile 0.02% Shell 0.02% Batchfile 0.01% Procfile 0.01%

gtfs-generator gtfs transit-agencies static-gtfs india python tornado-web public-transport bus metro

static-gtfs-manager's People

Contributors

Stargazers

Watchers

Forkers

dodori99 digitronik shravan91 dariaphoebe laidig juanpavz slaymd amit2011 narrami shaaaarpy vijistepsup friedtea15 mttlong thinhdo56 mvanlaar flexolumens gracecarrillo mike-holcombrm anhtnt90 innovacion-gubernamental-jalisco pranav-gairola jonike roni-rachmani livioactoll eddybautista-93 figo2002 mzzntn marcusyoung s-frey brodyflannigan civiliainc manazevedof joey92 thebasay0u haydar-c alig1000 paraita nautilustechnologyau nahelou chupper100 et430-rosenheim gururaja-ai

static-gtfs-manager's Issues

Stops : put a button/toggle to make table list only the stops that are visible on map

This could get a little tricky as the conventional flow is that the tabulator table is the main data holder and the map populates itself from there.

Initial thoughts on how to achieve this:

Leaflet - get current map bounds
Get list of all markers located within said bounds
Retrieve list of stop_ids from said markers
Run tabulator function to set filter with stop_id IN said list.

To "break out" of the map constraints, user presses the button or toggle again. At that point, need to run a function to remove filtering on the tabulator table.

Tricky parts : There may currently be triggers set to populate map based on what the table is showing. That shouldn't set off an infinite loop with both map and table going millennial on us and getting triggered by each other endlessly.

Ref:

Logging

Backend log file to capture all the print() statements, with timestamps. Maybe CSV with categorization. In Maintenance section, let user download and clear log file.
Possible categories : DB write, DB read, which page, which section, etc.

Calendar : date picker for start and end dates

Calendar: date picker for start and end dates

Make it work on Firefox browser

The tabulator JS library on the front end seems to have some issues with Firefox browser. Yet to do full testing, would be great if folks who are more invested in making this work in Firefox (mozilla friends?) can take this up. At my end I'm satisfied with running this on Chrome / Chromium browser.

Stops : make map show filtered results of table

Can use dataFiltered callback from here: http://tabulator.info/docs/3.4#callbacks-filter

Shake field if invalid

apply the shakeIt() function at all places in all web pages where data is not entered properly and save button is pressed etc. Right now it's working for the Misc > Maintenance section and few other places. Apply it everywhere.

Schedules : show status messages next to route selector for loading trips

Schedules page, Trips tab.
Show status messages like loading.. loaded etc next to route selector for loading trips. As this can take time.

Routes : show agency_id column

This was an optional field in the GTFS spec, but if we want to support multiple agencies then this becomes necessary. For now, add a column "agency_id" to the routes tabulator table.

For later: make dropdown for selecting agency from existing agencies.

Sequence stop addition: exclude stops already in list

Some reservations : may there be cases of a route circling around or going into a constrained area, coming back out the same way, and having the same stop further in the sequence?

Route Shape : need to sort the array by sequence

Bug seen on first time load on heroku (but not on local machine):

The shape's sequence order is messed up on loading from DB. Checking db.json, confirmed : the rows are stored as 1,10,100... ie text sorting.

  "shapes": {
    "1": {
      "shape_dist_traveled": 0.0,
      "shape_id": "R001_0",
      "shape_pt_lat": 10.110608860405785,
      "shape_pt_lon": 76.34918872659786,
      "shape_pt_sequence": 1
    },
    "10": {
      "shape_dist_traveled": 0.63,
      "shape_id": "R001_0",
      "shape_pt_lat": 10.105114690637768,
      "shape_pt_lon": 76.35012437809694,
      "shape_pt_sequence": 10
    },
    "100": {
      "shape_dist_traveled": 15.880000000000006,
      "shape_id": "R001_0",
      "shape_pt_lat": 9.990707688393124,
      "shape_pt_lon": 76.28752271426649,
      "shape_pt_sequence": 100
    },
    "101": { ...

When we upload a fresh shapefile, the ordering is proper. It's also peculiar that just the onward direction shape's order is messed up while the return direction shape is fine. So this seems to be an artefact from the GTFS import mechanism.

In any case, the program needs to have a sorting mechanism when retrieving the data for a shape. This will probably be better done at python end itself.

Maintenance : Stop id delete: have to delete from sequence db too

To replicate:

In Routes section, select a route and finalize (save) the sequence.
In Misc > Maintenance > Delete section, delete one of the stops.
Go back to routes and load that route again. The sequences don't load now: table is empty and so is map. Browser console gives an error:

Uncaught TypeError: Cannot set property 'stop_id' of undefined
    at initiateSequence (routes.js:453)
    at XMLHttpRequest.xhr.onload (routes.js:424)

It's because the stop_id was still left in the sequence db. The program read that and tried to load data for that stop_id and errored out.

To do: delete the stop_id from sequence db as well. Link for code location:
https://github.com/WRI-Cities/static-GTFS-manager/blob/v1.0.0/GTFSserverfunctions.py#L629

Secondary to-do: Even renaming of stop_id doesn't take care of this I believe.
https://github.com/WRI-Cities/static-GTFS-manager/blob/v1.0.0/GTFSserverfunctions.py#L722

And, resilience planning : This sort of error may happen again. The sequence db is an extra thing that is not part of the GTFS feed spec, it is created and kept by this program to help standardize the stops sequence for a route and keep a template ready for when a new trip is provisioned. It should not be allowed to fail the program. Therefore, the code reading it should quietly skip a non-matching stop_id encountered and move on to next stop in sequence. This is lower priority, so moving it to another issue.

get_argument() function should have default value everywhere

This is related with a lot of other bugs that are coming up when creating schedules from scratch. If an argument is missing in a GET or POST request received from the webpage, then instead of erroring out, the tornado handler should assign it a default value like None or ''(empty string) and handle things gracefully.

Ref: http://www.tornadoweb.org/en/stable/web.html#tornado.web.RequestHandler.get_argument

Fixing feeds: find missing routes, trips, stops

Scan thru trips db and list or mark routes that are missing from routes db.

Find routes having no trips defined

Detect stops mentioned in stop_times but missing from stops db.

Routes : Adding new stop to sequence not working

The "Add" button actions next to the stop choosers were still taking value from the older autocomplete inputs (stop2add-0, stop2add-1). Update them to take values of the newer stop selectors (stopChooser0, stopChooser1).

Stops map : make divIcons, show initials of stops

In Stops page map, do the same divIcon technique used in Routes page maps.

And show the first letter of each stop or something.

Handle circular / one direction routes

If a route is only one-directional (like circular) then need to handle that. Give the user a way to specify that when deciding sequence.

Add bulk entries by uploading CSV or pasting tabular data

For stops, routes, trips, fares, timings etc sections, give user an option to add entries in bulk by uploading a CSV file, or copy-pasting tabular data (tab-separated values) from an excel. At present, apart from full GTFS feed upload, the user has to manually add each entry. But what if they already have the data arranged in a table on their end and can name their headers to match ours?

Plus: we can additionally give them an option to download the presently loaded data in the tabulator table as CSV.

Concern : this feature will need to run diagnostics and validate the bulk data. It will also need to check for unique entries or fields.

In cases where part of the bulk data is fresh entries to be added and part seems to be an edit of existing ones, uses will have to be prompted with the statistics and before/after data and have to give consent for the changed entries. Either that or we give adequate disclaimers that existing data will be overwritten if the key fields are same etc.

Rejection: Rejection of bulk added entries could be on grounds like:

Mandatory fields not filled
It's for a different route than the one currently loaded (in Schedules > Trips)

Routes sequence maps : limit max zoom when fitting map to sequence or shape

All .fitBounds(... commands need to have options argument :
{padding:[20,20], maxZoom:14}

Like:
map.fitBounds(sequenceLayer[0].getBounds(), {padding:[20,20], maxZoom:14});

Misc section : agency and calendar entries have other linkages, can't directly delete

Misc section : agency and calendar entries have other linkages, can't directly delete.

Need to disable the delete columns in the Agency and Calendar tabulator tables on front-end and refer users to the Maintenance section.

Calendar service is already provisioned in maintenance section; agency_id deletion and renaming need to be provisioned. This will involve re-coding in both frontend JS and backend.

Export GTFS function: use Pandas instead of CSV writer

Operative code is around here: https://github.com/WRI-Cities/static-GTFS-manager/blob/v1.0.0/GTFSserverfunctions.py#L44
It calls the function csvwriter which

Current limitation : To know what columns to create in the csv, the csvwriter function only reads the first row in the table array. In the event that there are more fields further down the data, they will not make it to the exported feed.

Proposed solution: I haven't confirmed yet but I believe Pandas dataframe would handle this better.. would create columns for any and all keys in encounters throughout the array (list of dicts to be precise). I have already changed the GTFS Import mechanism over to Pandas for this same reason. It reads and stores into the db the numbers as numbers only whereas csvreader was casting everything as text, and that makes it straightforward to run numerical comparisons etc on the data.

Note: linked to v.1.0.0 though further developments will happen in master branch. So that when the code does eventually change these links (linking to line numbers in the file) don't break.

Sequence : show other stops on map and let user click and add to sequence

Sequences management: Load other stops greyed out or so on the map, user can click them to add to sequence. Desired UX: Compose a route by clicking stops on the map instead of searching by id/name.

Schedules: Route selector: don't trigger GET request for No Selection

schedules.html : On choosing "No Selection" option in the Route picker, backend gets an API call. Logs:

trips has 0 rows for route_id = No Selection
/API/trips GET call took 0.37 seconds.

At JS side itself if the value is "No Selection" then it should blank out the table and exit without making a GET request.

Routes : colorpicker for choosing colors

Routes : colorpicker for choosing colors of the route.

Provision Frequency based schedules

https://developers.google.com/transit/gtfs/reference/#frequenciestxt

If we want this tool to be used by bus agencies etc then we need to provision frequency.txt feature.

Use case scenario and Advantage

Suppose a certain 'trunk' route plies every 20 minutes, from 6am in the morning to 11.30pm at night. Both ways. For provisioning this route, we would need to make this many new entries in trips.txt (Schedules > Trips):

6am to 11.30pm => 0600 to 2330 =>1730 =>17.5 hrs.
20 mins => 3 times an hour.
trips = 3 x 17.5 = 51+1 = 52 trips.
Both directions => 104 trips.

That's not all. Suppose there are 30 stops along this route. Then, number of entries to be made in stop_times.txt (Schedules > Timings) : 30 x 104 = 3120.
And that's weekdays. If for the weekend service there are say 80 trips in a day, then another 2400 entries to be made for weekend.

And for all those entries we need to calculate what time each trip will begin, and interpolate the arrival/departure times for every stop. So, 5520 calculations to be done in total.

So one route running every 20 minutes needs total:

1 entry in routes.txt
184 entries in trips.txt
5520 entries in stop_times.txt, all with different values of timings.

One-time exercise? Ok then, suppose at some point the route is bumped up to plying every 10 minutes during just the morning and evening rush hours.

Transit agency boss : "What's the trouble? We're just changing the frequency for some time period."
Person in charge of updating GTFS feed: "#W$#@$#%#!$@!#@!##"

Now you may say "so what, since it's a repeating pattern let's automate it". But if it's a repeating pattern, why can't the app reading and interpreting the GTFS feed automate it? The GTFS feed itself should be restricted to carrying information that cannot be auto-generated.

For this reason, frequencies.txt is introduced in the static GTFS standard.

Let's take the same 20-min frequency route again. Instead of several trips, only two trips (one per direction) are entered in trips.txt. And then, in frequencies.txt, make two entries for the two directions:

trip_id	start_time	end_time	headway_secs	exact_times
R1_0	06:00:00	23:30:00	1200	0
R1_1	06:00:00	23:30:00	1200	0

Then, in stop_times.txt, just one run along each direction is recorded (30 stops x 2 direction = 60 entries total) with the starting stop's arrival time set to 00:00:00 and subsequent times entered as an offset from that.

So now, the route running every 20 minutes needs total:

1 entry in routes.txt
4 entries in trips.txt (considering 2 for weekdays + 2 for weekend service)
4 entries in frequences.txt
60 entries in stop_times.txt

Looks more manageable, right?

Then, suppose the transit agency decides to double the frequency during rush hour, say 8 to 10am and 6 to 8 pm. The only edit needed now is in frequencies.txt. The day is now split into 5 time periods :

trip_id	start_time	end_time	headway_secs
R1_0	06:00:00	08:00:00	1200
R1_0	08:00:00	10:00:00	600
R1_0	10:00:00	18:00:00	1200
R1_0	18:00:00	22:00:00	600
R1_0	22:00:00	23:30:00	1200

... and similarly for the return journey.
Hence, using a frequencies feature can greatly help for transit agencies who have some or all routes running on frequencies anyways. Plus, the size of the GTFS feed would be greatly reduced and so the program works faster.

Quietly skip any mismatching stops in sequence db

Ref #6
This sort of error (stop_id is in sequence db not found in main db) may happen again through some other way. The sequence db is an extra thing that is not part of the GTFS feed spec, it is created and kept by this program to help standardize the stops sequence for a route and keep a template ready for when a new trip is provisioned. It should not be allowed to fail the program. Therefore, the code reading it should quietly skip a non-matching stop_id encountered and move on to next stop in sequence. This is lower priority, so moving it to another issue.

Fares: Show simple Fare Rules table also

as of v.1.3.0 the fare rules tab shows a pivoted table. This can be easier for editing fares, but we cannot add a new fare rule here if, for example, a new zone_id is configured.

Way forward: Create a new tab "Fare Rules - Simple". Here, load a simple linear tabulator table that shows the fare rules as they are in the GTFS spec: https://developers.google.com/transit/gtfs/reference/#fare_rulestxt

This requires provisioning a new tabulator table and related actions on the JS side, a new API call to the backend, and a new API handler endpoint on the backend side that simply reads the full fare_rules table and returns it as JSON.

Schedules: Time picker for timings

Schedules: Time picker for timings entry or edit.

Routes Sequence : tell if default sequence is already saved or not

Routes Sequence : tell if default sequence is already saved or not
Also tell user that with the default sequence saved they can create new trips in Schedules section.

This may involve some Python side tweaks also as AFAIK currently the JS API call function has no way of telling if this is a saved sequence or auto-generated one.

Would be similar to how in Schedules > Timings, for a chosen trip, user is told if the timings data was pre-stored or has been auto-generated.

Slip in a note infoming that saving default sequence is needed to be able to provision new trips under the route.

Use tabulator's own ajax loader to load data at some places

See /xml2GTFS.html Stations section. Tabulator is loading the data via ajax GET request by itself. The advantage : less coding needed in /js/xml2GTFS.js

It also has on-load functions to trigger things after the data has loaded or if it fails to load. See http://tabulator.info/docs/3.4#callbacks-ajax

We can do this on all the pages where a tabulator table has to be populated on page load.

Routes > Sequence : Repopulate shapes dropdowns on maps

When a new route is loaded, or a sequence saved, or a new shape uploaded, repopulated the dropdown options in the shape pickers on the onward and return sequence maps.

Standardize Instructions accordion placement in each page

Top? Bottom? Under the title? Make up your mind!!

Misc:Maintenace : include fare_ids also

Misc>Maintenanace section : include fare_ids also

Maintenance: replaceIDfunc : skip table if no records

If there are no matching records in a table to replace, then skip that table-key pair and move to next.

Currently it is erroring out if there are no records:

[{'table': 'calendar', 'key': 'service_id'}, {'table': 'trips', 'key': 'service_id'}]
WK
ALL
ERROR:tornado.application:Uncaught exception POST /API/replaceID?pw=kmrl&valueFrom=WK&valueTo=ALL (::1)
HTTPServerRequest(...)
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/tornado/web.py", line 1510, in _execute
    result = method(*self.path_args, **self.path_kwargs)
  File "web_wrapper.py", line 1120, in post
    returnMessage = replaceIDfunc(valueFrom,valueTo,tableKeys)
  File "<string>", line 731, in replaceIDfunc
  File "/usr/local/lib/python3.5/dist-packages/tinydb/database.py", line 503, in write_back
    if sorted(doc_ids)[-1] > self._last_id:
IndexError: list index out of range

Database brainstorming

The program is currently (as of 12-Apr-2018) using using TinyDB to manage its database. The database is a db.json file in GTFS folder. (Actually there's another one too, sequence.json)

Some design considerations went into this decision:

Schema-less database

It is possible that transport operators may want to store additional columns in their gtfs files, or opt out completely of some of the optional columns defined in the specs.
It is also possible for columns to be introduced / phased out over time and not be defined from the beginning.
To accommodate for this variability, a schema-less database is preferred, where you do not have to set the columns "in stone" from the beginning.
For these reasons, I didn't go with SQLite and looked for file-based NoSQL solutions. Am using TinyDB for now, but open to alternatives.
This being said, it then becomes imperative for the program to ensure that the required columns as per specs are kept so that published GTFS feed is valid.

Portability

Portability of code is a key requirement for this project.
We should be able to put the folder into a pen drive, take it elsewhere and run it with no or minimal installation requirements (working on that on python dependencies side..).
Especially startup configurations which will vary from system to sytem need to be avoided. Apart from other factors, it is very annoying especially for non-IT folks.
Dependency on other servers is to be avoided, which is why I could not consider MongoDB as a database option. The database needs to be portable, file-based and should move with the program.
In the use case scenario of transport sytem operators, the person put in charge to handle the data may change.

Challenges with this db choice:

Tried it with a much larger GTFS feed that has over 5000 stops and many hundred routes. The db file inflates to some 100s of MBs, and the API calls become painfully slow. So as of v.1.0.0, we can use this program with smaller GTFS feeds but not larger ones.
I have not yet tried the caching etc options of tinydb.

Front end limitations

as of v1.0.0, though importing of gtfs feeds with varying columns is supported, these additional columns will not be visible at the web interface. Practically all tables have fixed column definitions. The data loaded would have the additional fields, and when it is edited and written back to DB, these additional fields will follow with it. But the user won't see them, and when new entries are created the additional fields will not be assigned to them.

Put Loader animations wherever there are API calls made and waiting for results

Possible source for code: https://codepen.io/aurer/pen/jEGbA

Shapes : Draw on page, multiple formats

Shapes: Give an option to draw on the page itself instead of uploading a shapefile.
Also, keep multiple formats shapefile upload : .kml, .gpx in addition to .geojson

Routes: system-generate new route_id

Tie it to agency.
Example: KMRL_R001, KMRL_R002,...

User has to pick the agency, then just click Add Route button.
User can go rename it from the Maintenance section.

Translations: Give dropdown of existing names from across the system

Translations: Give dropdown of existing names from across the system.
Any translation not done yet or not done in the picked language should come in a dropdown.

Implies : don't allow them to translate any random string.

Sequence : altered sequence order not saving to DB

In Routes page > Sequence (Onward and Return)
After moving a stop up or down in the sequence table, upon saving the changed sequence wasn't being saved to DB. This was because global variables sequence0 and sequence1 in routes.js were not saving the changed rows.

Realized that the code doesn't need global variables to begin with. Data can be retrieved from tabulator tables at any time. So, changing the other functions in js/routes.js to not use the global sequence variables and instead work directly with the tabulator tables.

Misc: Maintenance : updated dropdowns

Misc: Maintenance : updated dropdowns
https://harvesthq.github.io/chosen/options.html#triggerable-events
Use the chosen:updated trigger

Bug: After deleting, the dropdowns' html on the behind is being updated, but because the chosen.js plugin acting on them is not being updated, it's leading to misleading selections. (you choose option x, but actually another option is selected)

Sequence saving when no shapes

For brand new routes, saving sequence API call is crashing because of no shapes. Shape the code (pun) such that it is able to gracefully handle it in case there aren't any shapes allotted to the new route's directions.

New trip creation: should populate other fields too

Current fields list for trips table: route_id,service_id,trip_id,trip_headsign,direction_id,block_id,shape_id,wheelchair_accessible

Of these, in the Schedules page when we create a new trip, only route_id and trip_id is populated. Also, at present they are filled by text input, whereas some of these should be fixed values depending on other tables (like service_id). It can cause trouble if saved without populating properly.

Maintenance: Shape delete : zap from sequence DB also

To do this, the key shape0/shape1 needs to be popped from the record. It won't be enough to set it to blank string.

How a sequence is saved in DB:

{
    "1": {
      "0": [
        "ALVA",
        ...
        "MACE"
      ],
      "1": [
        "MACE",
        ...
        "ALVA"
      ],
      "route_id": "R001",
      "shape0": "R001_0",
      "shape1": "R001_1"
    }
  }
}

Schedules : Timings : allow changing order, adding or removing stops in the trip

Present workflow:

User has to first go to Routes section and "finalize" the onward and return journey's sequence for the route in question. Here they can make changes to the sequence, add or remove stops.
This creates a "default" sequence in the system's internal database, which is then referred to in the Schedules section when the user creates a new trip for that route.
The system loads up the generic sequence, and timings are starting from dummy values 00:00:00 and end at 01:00:00.
The user has to edit and feed in the timings.
Presently, we're not letting the user customize the trip by moving rows around, change sequence numbers etc. Why? Because we still need to code the validation needed for this.

Therefore, task at hand:

Validation for a trip's set of entries into stop_times table as per GTFS spec.
And THEN we can allow the user to edit the trip's timings data fully.

Things that will be needed on user interface end:

Stops dropdown selection.
Time picker... #20
Possibly.. distance or speed estimator.

Python side: handle empty tables gracefully

While managing of existing data is working well, lot of errors encountered when we want to create new data like routes etc. Also, Misc > Maintenance section all-IDs API call errors out when the DB has been reset to blank slate.

Undo/Redo buttons info, hide/unhide only when table is edited

Wherever they are used, make them visible on the page only when the table is edited. Make them disappear when changes are saved / committed to DB (big green button).

Maintenance: deletefromDB : skip table if no records

For both deleting and zapping, skip if there are no existing records in that table for that key and value pair.

Introduce new colors for the buttons, alerts etc

link for possible CSS: https://www.bootply.com/112999

Improve upload shapefile popup in Routes > Sequence

See: https://jqueryui.com/dialog/#default
http://api.jqueryui.com/dialog/

Same tech many also be applied for creating new trip etc.

Fares : ordering of stops / zones

Fares: Fare Rules : Presently loading in alphabetical order of stations. Explore if possible to load in a sequence, and decide which sequence.
Idea: Have an expandable (accordion) section for filtering down the stops. There, show a routes listing table. User selects routes (can select multiple routes) and presses a filter button. That restricts the fare rules table to only the stops that are covered by those routes.
Why multiple routes, why not a single route: For interchanges of course!
-> Multiple routes select : can achieve through chosen.js