Code Monkey home page Code Monkey logo

cutr-at-usf / gtfs-realtime-validator Goto Github PK

View Code? Open in Web Editor NEW
93.0 13.0 39.0 1.84 MB

Java-based tool that validates General Transit Feed Specification (GTFS)-realtime feeds. See https://github.com/MobilityData/gtfs-realtime-validator for the latest!

License: Other

Java 69.33% HTML 3.38% CSS 0.18% JavaScript 27.09% Dockerfile 0.03%
gtfs gtfs-realtime-feed java transit public-transportation gtfs-realtime gtfs-realtime-data gtfs-realtime-validator

gtfs-realtime-validator's Introduction

gtfs-realtime-validator's People

Contributors

antoineaugusti avatar barbeau avatar dependabot[bot] avatar derhuerst avatar hbruch avatar kylegancarz avatar laidig avatar leonardehrenfried avatar mohangandhigh avatar nipuna-g avatar rjvitorino avatar suryakandukoori avatar thzinc avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

gtfs-realtime-validator's Issues

Jackson artifacts are missing

If you try to build and run the project (after manually doing mvn install for the gtfs-validator project, which is being addressed in https://github.com/CUTR-at-USF/gtfs-realtime-validator/pulls), you get the following build error:

Error:(28, 34) java: package com.fasterxml.jackson.core does not exist
Error:(29, 38) java: package com.fasterxml.jackson.databind does not exist
Error:(97, 83) java: cannot find symbol
  symbol:   class JsonProcessingException
  location: class edu.usf.cutr.gtfsrtvalidator.api.resource.GtfsFeed
Error:(125, 9) java: cannot find symbol
  symbol:   class ObjectMapper
  location: class edu.usf.cutr.gtfsrtvalidator.api.resource.GtfsFeed
Error:(125, 35) java: cannot find symbol
  symbol:   class ObjectMapper
  location: class edu.usf.cutr.gtfsrtvalidator.api.resource.GtfsFeed

Looks like we're missing the Jackson dependencies in the pom.xml. I'll add them shortly.

Should Service Alert IDs change when alert content changes?

It's not clear to me whether there is currently a best practice surrounding identifying changes and maintaining IDs within published GTFS-rt service alerts.

For example, it's common for an agency to publish an alert, and then make changes or updates to that same alert. In OneBusAway, we show new alerts to riders, and then mark them as "read" after the user sees it, and we completely hide it from view when the user requests that. This is all based on the situation_id from the OBA REST API response, which looks like this:

"id": "Hillsborough Area Regional Transit_ee2415bb-8c2b-4eb7-b9a0-b268110254ea"

API request:
http://api.tampa.onebusaway.org/api/api/where/arrivals-and-departures-for-stop/Hillsborough%20Area%20Regional%20Transit_6497.json?minutesAfter=35&version=2&key=TEST

API response including situation (alert):

"references": {
"agencies": [],
"routes": [],
"situations": [
{
"activeWindows": [],
"allAffects": [
{
"agencyId": "Hillsborough Area Regional Transit",
"applicationId": "",
"directionId": "",
"routeId": "",
"stopId": "",
"tripId": ""
}
],
"consequences": [],
"creationTime": 1478195507024,
"description": {
"lang": "en",
"value": "Travel between HART and PSTA with one fare. Download the Flamingo Fares app for mobile device and pay your fare via smartphone. 3-Day unlimited regional pass for $11. Check out www.gohart.org or www.psta.net for more information."
},
"id": "Hillsborough Area Regional Transit_ee2415bb-8c2b-4eb7-b9a0-b268110254ea",
"publicationWindows": [],
"reason": "OTHER_CAUSE",
"severity": "noImpact",
"summary": {
"lang": "en",
"value": "Flamingo Fares"
},
"url": null
},
...

I need to look into how we're generating the UUID of this alert for OBA - I'm guessing if any of the content changes then this UUID changes. From what I can tell there isn't any guidance in GTFS-rt for maintaining the same ID in the GTFS-rt feed for the same message. For example, if HART made a spelling error in the above alert and fixed it, I believe it would show up as a new UUID from OBA and users would need to acknowledge it as being read again.

Should we recommend that the same GTFS entity ID be maintained if there are no significant changes to the message, so users wouldn't need to acknowledge a new alert that has minor differences from a past alert?

HART GTFS-rt service alerts examples:
http://api.tampa.onebusaway.org/api/api/gtfs_realtime/alerts-for-agency/Hillsborough%20Area%20Regional%20Transit.pbtext?key=TEST

Service Alert reference:
https://developers.google.com/transit/gtfs-realtime/reference/Alert

Use ORM tool

Use an ORM tool(Hibernate) for managing the database connectivity While the initial setup was manageable with a direct calls from the JDBC, the validator has grown in complexity which makes it difficult to maintain. In the next major version, Hibernate should be incorporated in order to make thinks easier to maintain

Move existing wiki documentation for rules into repository

We have some existing documentation for the project in a wiki:
https://github.com/CUTR-at-USF/gtfs-realtime-validator/wiki

There are two existing pages related to rules:

  1. List of Validation Rules
  2. Rules and Test Cases

I'd like to move these into a new markdown file RULES.md within the repo so it is better versioned in relationship to the existing code. This make it easier to track the addition of new rules and new documentation.

@mohangandhiGH As you're reviewing the existing rules that are implemented, can you please review this documentation and move it into the repo when we confirm it's implemented?

Check that the delay field is consistent with difference between the scheduled and predicted times

If not, this would generate an error. See https://groups.google.com/d/msg/gtfs-realtime/qNc7ButQbW8/tDwanvClBQAJ.

Note that this applies to stop_time_update.arrival/departure.delay, as well as trip_update.delay. I noticed that SDMTS is providing stop_time_update.departure.time, as well as trip_update.delay:

"entity": [
{
  "id": "1",
  "trip_update": {
	"trip": {
	  "trip_id": "12341185",
	  "route_id": "30"
	},
	"stop_time_update": [
	  {
		"departure": {
		  "time": 1498664460
		},
		"stop_id": "95034"
	  }
	],
	"vehicle": {
	  "id": "911"
	},
	"timestamp": 1498664286,
	"delay": -120
  }
},

Can't load GTFS data from an URL

When running the application, it won't load GTFS data from the provided URL.

Steps to reproduce:

  1. Clone master branch
  2. Run the application
  3. Enter a GTFS URL to the GTFS zip file

Reported by @scrudden. @scrudden let me know if I got any of the above incorrect.

@mohangandhiGH Could you please take a look at this?

ADDED trips can't be included in GTFS

Some vendors re-use trip_ids in GTFS-realtime when they add ad-hoc trips to the schedule. This will result in non-unique trip_ids within the GTFS-realtime feeds and will cause errors for consumers. New unique trip_ids (that don't collide with any existing GTFS trip_ids ) must be created for any ADDED trips

gtfs-validator artifacts are missing

If you try to build the project with mvn package, you get the below error:

Error:(25, 49) java: package com.conveyal.gtfs.validator.json.backends does not exist
Error:(26, 40) java: package com.conveyal.gtfs.validator.json does not exist
Error:(27, 54) java: package com.conveyal.gtfs.validator.json.serialization does not exist
Error:(48, 40) java: package com.conveyal.gtfs.validator.json does not exist
Error:(157, 9) java: cannot find symbol
  symbol:   class FileSystemFeedBackend
  location: class edu.usf.cutr.gtfsrtvalidator.api.resource.GtfsFeed
Error:(157, 45) java: cannot find symbol
  symbol:   class FileSystemFeedBackend
  location: class edu.usf.cutr.gtfsrtvalidator.api.resource.GtfsFeed
Error:(158, 9) java: cannot find symbol
  symbol:   class FeedValidationResultSet
  location: class edu.usf.cutr.gtfsrtvalidator.api.resource.GtfsFeed
Error:(158, 47) java: cannot find symbol
  symbol:   class FeedValidationResultSet
  location: class edu.usf.cutr.gtfsrtvalidator.api.resource.GtfsFeed
Error:(160, 9) java: cannot find symbol
  symbol:   class FeedProcessor
  location: class edu.usf.cutr.gtfsrtvalidator.api.resource.GtfsFeed
Error:(160, 39) java: cannot find symbol
  symbol:   class FeedProcessor
  location: class edu.usf.cutr.gtfsrtvalidator.api.resource.GtfsFeed
Error:(168, 9) java: cannot find symbol
  symbol:   class JsonSerializer
  location: class edu.usf.cutr.gtfsrtvalidator.api.resource.GtfsFeed
Error:(168, 41) java: cannot find symbol
  symbol:   class JsonSerializer
  location: class edu.usf.cutr.gtfsrtvalidator.api.resource.GtfsFeed

This is because the laidig fork of gtfs-validator doesn't have any hosted artifacts. We're looking at ways to address this. Some related work that may fix this by @mohangandhiGH in #30.

For now, the workaround is to clone https://github.com/laidig/gtfs-validator and do mvn install to install the artifacts in your local Maven repo before building this realtime validator project.

Add pagination for feeds checked

Add pagination to the feeds checked interface. This will allow the users to navigate to earlier errors. The current interface only shows the first 10 errors.

Back-end has to be changed in order to support this behavior.

Set up CLA and CLA verifier

We will need some way to have contributors sign CLAs, just like the Google Transit repo requires. They have some automated script that adds "cla:yes" or "cla:no" tags based on whether the Github user name has signed their CLA.

From @nipuna777:

Google seems to be using a script Application script to check if the contributors email is in a spreadsheet. (The emails gets added to the spreadsheet when you accept the contributors agreement)
The instructions on how to implement this is given in the comments of the CLA-verifier(https://github.com/angular/google-cla-verifier-for-github/blob/master/cla-verifier.gs). While this is for Angular I'm thinking they are using the same thing for transit as well.
I've tried setting this up on a personal repo to see if it would work. But, it seems that without access to the spreadsheet with those who have signed the CLA we can't use the exact script. Someone with access to that spreadsheet would have to write and run the app script.

We're using Google forms for OneBusAway ICLA's:
https://docs.google.com/forms/d/12jV-ByyN186MuPotMvxJtNKtSaGGTnEHm8rXomM2bm4/viewform

...which stores the entered information into a Google Sheets document, so maybe we can do the same with this project.

Providers must include stop_sequence if the same stop is visited more than once in a trip

Currently, if a stop is visited more than once as part of a trip (e.g., a loop route):

  • stop_id = 1756, stop_sequence=1
  • stop_id = 1756, stop_sequence=99

...and the real-time producer doesn't provide stop_sequence for predictions on this trip that reference the duplicate stop (1756 above), consumers can't understand to which instance of the stop the predictions should be applied.

Therefore, providers should include stop_sequence in the real-time feed if the same stop is visited more than once in a route.

This is currently a best practice, as the GTFS-rt spec doesn't currently explicitly say this. However, I've written a proposal here to add this requirement to the spec - google/transit#20. If this proposal is approved and merged, then we can make this a requirement.

Add Readme

@nipuna777 Could you please add a basic Readme to this repo that has the steps to build/run project?

Doesn't have to be detailed at this point, but should be enough for someone else to get it started. Thanks!

StopTimeUpdate - either arrival or departure time must be populated

See https://groups.google.com/forum/#!topic/onebusaway-developers/2cutzTOPkk4.

I was just searching the GTFS-rt spec, and I actually don't think that there is an explicit reference saying that one of the two fields (arrival or departure time) must be populated. I looked at:

Check that non-revenue trips/stops aren't included in TripUpdates and VehiclePositions feeds

Sometimes agencies include trips that are for training or non-revenue trips/stops in their GTFS-realtime feeds. These trips/stops are not included in the GTFS schedule data. These are not regularly scheduled trips/stops and are not available to the public. These trips/stops should not be included in the GTFS-realtime feeds.

This would generate a warning. See https://groups.google.com/d/msg/gtfs-realtime/qNc7ButQbW8/tDwanvClBQAJ.

Project roadmap

@nipuna777 Thanks for all your work on this project in the Google Summer of Code!

Could you provide a list of things you were able to accomplish, and what still needs to be done (by you or others) to bring this tool to a working state?

This will make it easier for others to see where we're currently at, and what still needs to be accomplished.

Per stop predictions - providers shouldn't drop arrivals until after the bus passes the stop

In the "Stop Time Updates" section, the GTFS-realtime spec (https://developers.google.com/transit/gtfs-realtime/trip-updates#stop-time-updates) says:

A trip update consists of one or more updates to vehicle stop times, which are referred to as StopTimeUpdates. These can be supplied for past and future stop times. You are allowed, but not required, to drop past stop times. When doing this, be aware that you shouldn't drop a past update if it refers to a trip that isn't yet scheduled to have finished (i.e. it finished ahead of schedule) as otherwise it will be concluded that there is no update on this trip.

My interpretation of the highlighted portion is that when multiple StopTimeUpdates exist in a trip (i.e., per-stop predictions), individual StopTimeUpdates shouldn't be dropped from the GTFS-rt feed if the vehicle is running ahead of schedule until after the scheduled arrival time for that stop (from GTFS stop_times.txt).

For example, if the following data appears in the GTFS-rt feed:

Stop 4 –Predicted at 10:18am (scheduled at 10:20am – 2 min early)
Stop 5 –Predicted at 10:30am (scheduled at 10:30am – on time)

...the prediction for Stop 4 cannot be dropped from the feed until 10:21am, even if the bus actually passes the stop at 10:18am. If the StopTimeUpdates for Stop 4 was dropped from the feed at 10:18am or 10:19am, and the scheduled arrival time is 10:20am, then the consumer should assume that no real-time information exists for Stop 4 at that time.

A vendor is arguing that this text applies only to the TripUpdate, not to the StopTimeUpdates, and they are allowed to drop updates for vehicles running early as soon as the vehicle passes the stop (no matter what the scheduled arrival time is). If the vendor's interpretation is correct, it results in a very poor end user experience for consumers - for example, if a vehicle was running 5 minutes early, and the user checks their app 4 minutes before the scheduled arrival time, the app would only show scheduled information, and would show that the vehicle was expected to arrive in 4 minutes (even though at a system level we know that the vehicle already passed the stop).

Also, in OneBusAway, riders like to see negative ETAs that indicate that a bus just left, so they know if they just missed a bus (vs. facing the unknown and wondering if they actually just missed the bus, or if the system doesn't have information about that bus). However, we can't show these negative arrivals (early, on time, or late) if the producer drops the update as soon as the vehicle passes, as we would only show scheduled negative arrival times. Technically dropping on time or late arrivals after the vehicle passes the stop is allowed by the GTFS-rt spec, but in terms of best practices I would recommend that they remain in the feed for at least another few minutes and/or stops.

Related proposal to clarify the GTFS-rt spec here - google/transit#16.

The problems this created in OneBusAway are outlined here - OneBusAway/onebusaway-application-modules#162.

Discussion on the GTFS-realtime group - https://groups.google.com/forum/#!topic/gtfs-realtime/3rAf6UIhAsQ.

So, new rule that's required behavior:

  • Producers should not drop a past StopTimeUpdate if it refers to a stop with a scheduled arrival time in the future for the given trip (i.e. the vehicle has passed the stop ahead of schedule)

New rule that's optional behavior (best practice):

  • Providers should not drop late or on time arrivals for a stop until several minutes and/or stops after the bus passes the stop

Validating stop_ids reference by GTFS-rt feed have a location_type of 0

Type: Error (Implicitly stated in the GTFS specifications)
Description: All stop_ids referenced in GTFS-rt feeds must have the "location_type" = 0
Affected Feed Type(s): TripUpdate, VehiclePostion
Reference(s): https://developers.google.com/transit/gtfs/reference?hl=en#stop_timestxt

Notes: Since checking if the stop_ids in the GTFS-rt feeds have a location type of 0 would mean that all stop_ids referenced in that iteration are valid, would we any need of checking the GTFS feed to see if,
"If location_type is used in stops.txt, all stops referenced in stop_times.txt must have location_type of 0."?

Change numbering scheme for errors and warnings

The current numbering scheme would not hold up if the number of rules created becomes high. Also, they do not represent the details of the error. (ex e001 only specifies that this is an error. Additional details such as the feeds affected can be shown using this id)

Changing the numbering scheme before the first milestone would be easier than attempting to do so down the road.

Stop_times.timepoint should not be specified for frequencies.txt exact_times=0 trips

There is an optional field in stop_times.txt timepoint that is used to specify timepoint that a transit vehicle should strictly adhere to (e.g., hold if they are running early). If a trip is defined in frequencies.txt with the exact_times=0 value, then it is a frequency-based trips, which means that a schedule doesn't exist. Therefore, frequency-based trips should NOT have any timepoint=1 values in stop_times.txt - the field should either be 0 or blank - to do this would be an error.

See discussion at google/transit#47 (comment).

I believe this is purely a static GTFS issue, so we should open a new issue on the conveyal gtfs-validator project and implement the rule there:
https://github.com/conveyal/gtfs-validator

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.