itinero / gtfs Goto Github PK
View Code? Open in Web Editor NEW.NET implementation of a General Transit Feed Specification (GTFS) feed parser.
Home Page: http://www.itinero.tech
License: MIT License
.NET implementation of a General Transit Feed Specification (GTFS) feed parser.
Home Page: http://www.itinero.tech
License: MIT License
Parsing a feed from txt files is slower than it could be. Look at how this can be improved.
Hello xivk,
I may be wrong but according to my investigation GTFSReader().Read doesn't Dispose() FileStream.
Here is an unit test to reproduce the issue :
using System;
using System.IO;
using GTFS.IO;
using NUnit.Framework;
namespace GTFS.Test
{
[TestFixture]
internal class FileNotDisposedTest
{
[Test]
public void TestFileDisposed()
{
ReadAndParseGtfsSampleData();
try
{
var fileWriter = File.OpenWrite(@"sample-feed/agency.txt");
}
catch (System.IO.IOException accessDeniedException)
{
/*
* System.IO.IOException : The process cannot access the file 'sample-feed\agency.txt'
* because it is being used by another process.
*/
throw;
}
}
private void ReadAndParseGtfsSampleData()
{
try
{
const string inputFolderPath = @"sample-feed";
var directoryInfo = new DirectoryInfo(inputFolderPath);
var gtfsDirectorySource = new GTFSDirectorySource(directoryInfo);
var reader = new GTFSReader<GTFSFeed>();
reader.Read(gtfsDirectorySource);
}
catch (Exception ex)
{
var type = ex.GetType();
if (!type.FullName.StartsWith("GTFS.Exceptions"))
{
throw;
}
}
}
}
}
If the issue is real and not a miss-usage from my part, this could be fix by adding a file.Dispose() at the end of GTFSReader.Read(IGTFSSourceFile file, T feed, EntityParseDelegate parser, EntityAddDelegate addDelegate) (but I didn't test it yet)
Improve shape builder by:
An example of blind resolving and routing between stops:
http://geojson.io/#id=gist:anonymous/e5d32e97bd395824f66365d741bdc87d&map=17/51.25620/4.80518
According to the code route_desc is required, but according to the specifications it is not:
this.CheckRequiredField(header, header.Name, this.RouteMap, "route_desc");
see:
protected virtual Route ParseRoute(T feed, GTFSSourceFileHeader header, string[] data)
{
// check required fields.
this.CheckRequiredField(header, header.Name, this.RouteMap, "route_id");
this.CheckRequiredField(header, header.Name, this.RouteMap, "route_short_name");
this.CheckRequiredField(header, header.Name, this.RouteMap, "route_long_name");
this.CheckRequiredField(header, header.Name, this.RouteMap, "route_desc");
this.CheckRequiredField(header, header.Name, this.RouteMap, "route_type");
// parse/set all fields.
Route route = new Route();
for (int idx = 0; idx < data.Length; idx++)
{
this.ParseRouteField(feed, header, route, header.GetColumn(idx), data[idx]);
}
return route;
}
There is no option to customize a seperator.
Make adding shapes optional if there is already a shapes file with content the add-shapes command should optionally do nothing.
Parsing the feed-info not yet supported.
According to the GTFS reference, the agency_id field is optional, but in the details is specified that agency_id is required if more than one agency is provided:
The agency_id field is an ID that uniquely identifies a transit agency. A transit feed may represent data from more than one agency. The agency_id is dataset unique. This field is optional for transit feeds that only contain data for a single agency.
Merge validator with the regular GTFS.Tool project using an extra switch.
/// <summary>
/// Returns all the shapes for the given trip id.
/// </summary>
/// <param name="tripId"></param>
/// <returns></returns>
public IEnumerable<Shape> GetShapes(string tripId)
{
return _shapes.Where(x => x.Id == tripId);
}
Should instead be something more inline with the following (depending on the case you want to address):
/// <summary>
/// Returns all the shapes for the given trip id.
/// </summary>
/// <param name="tripId"></param>
/// <returns></returns>
public IOrderedEnumerable<Shape> GetShapes(string tripId)
{
var shapeId = _trips.First(x => x.Id == tripId);
return _shapes.Select(t => t.ShapeId).OrderBy(s => s.SequenceId);
}
The problem with your one is that you can't query the shapes entity with a tripId. You can get the shapeid from a trip. Then get the many shapes back with that (non unique) shape id in the shapes list/table.
Currently, the source code checks the stop sequences (per travel) are incremented by one.
Whereas the GTFS specification precises that:
"The stop_sequence field identifies the order of the stops for a particular trip. The values for stop_sequence must be non-negative integers, and they must increase along the trip.
For example, the first stop on the trip could have a stop_sequence of 1, the second stop on the trip could have a stop_sequence of 23, the third stop could have a stop_sequence of 40, and so on."
Note also, it seems the zero value can be a valid value.
Here a suggested patch inside GTFS\GTFS\Validation\GTFSFeedValidation.cs:
/* check all sequences */
foreach (var stopTimesPair in stopTimesIndex)
{
uint current=0, previous;
foreach(var stopTime in stopTimesPair.Value)
{
previous = current;
current = stopTime.StopSequence;
if (previous != 0)
{
if (previous >= current)
{
messages = string.Format("Stop sequences values shall increase and be unic in stop_times file for trip id {0}.", stopTimesPair.Key);
return false;
}
}
}
}
The GTFSFeed.SetFeedInfo
method should set the version from the argument, not its _feedInfo
i.e.: this._feedInfo.Version = feedInfo.Version;
In addition, the GTFSEntity.Tag
property is not copied from the feedInfo
argument to the _feedInfo
instance field.
public int GetColumnIndex(string column) does not return -1 but 0 when column does not exist.
According to the GTFS specifications, the calendar.txt is required, but then again it is not ๐
see GTFS reference:
calendar.txt
- Required - Dates for service IDs using a weekly schedule. Specify when service starts and ends, as well as days of the week where service is available.
calendar_dates.txt
- Optional - Exceptions for the service IDs defined in the calendar.txt file. If calendar_dates.txt includes ALL dates of service, this file may be specified instead of calendar.txt.
At this location I have found a sample feed where the calendar.txt is missing, which because of the GTFSReader.GetRequiredFiles
method will fail to be read.
So, the question is: should the GTFS library relax the requirement that calendar.txt is required and instead check if one of calendar.txt
or calendar_dates.txt
is provided?
Please add debugging support by providing the .pbd file in the Nuget package.
Add a ToSimple() function to extended route type. We have the reverse already here:
https://github.com/itinero/GTFS/blob/develop/src/GTFS/Entities/Enumerations/RouteType.cs#L73
Make lat/lons nullable, they can be empty:
According to google reference there is an additional field called platform_code: https://developers.google.com/transit/gtfs/reference/#stopstxt
Are GTFS packages available for WinRT applications ? (Windows 8.1, Windows Phone 8.1). Classes like DirectoryInfo are not available on this platform ...
Add logging option to report warnings when not in strict-mode.
Class GTFS.Entities.Route (incorrectly) defines AgencyId as [Required] but the GTFS reference claims it is optional. In addition, the GTFsFeedValidation class checks the existence of the agency_id value in the list of known agencies. Obviously, if AgencyId is null, the check will fail.
// check routes.
var routeIds = new HashSet<string>();
foreach(var route in feed.Routes)
{
if (routeIds.Contains(route.Id))
{ // oeps, duplicate id.
messages = string.Format("Duplicate route id found: {0}", route.Id);
return false;
}
routeIds.Add(route.Id);
if (!agencyIds.Contains(route.AgencyId))
{// oeps, unknown id.
messages = string.Format("Unknown agency found in route {0}: {1}", route.Id, route.AgencyId);
return false;
}
}
When you write a feed to some path, it creates empty files for the entity sets that contain no items. I would be better not to create these files at all.
The values in the DropOffType enum were directly copied from PickupType. It would make sense to rather change them from e.g. NoPickup to NoDropOff and so on.
I haven't tested it to see what would happen, but there is a backward compatibility issue when adding more columns to existing tables in the way that you did it. I would suggest adding this line to the end of SQLiteGTFSFeedDb.RebuildDb()
:
if (!ColumnExists("stop_time", "timepoint")) this.ExecuteNonQuery("ALTER TABLE [stop_time] ADD [timepoint] TEXT;");
this uses the new method in RebuildDb()
:
/// <summary>
/// Checks if the given table contains a column with the given name.
/// </summary>
/// <param name="tableName">The table in this database to check.</param>
/// <param name="columnName">The column in the given table to look for.</param>
/// <returns>True if the given table contains a column with the given name.</returns>
private bool ColumnExists(string tableName, string columnName)
{
var cmd = new SQLiteCommand("PRAGMA table_info(" + tableName + ")", _connection);
var dr = cmd.ExecuteReader();
while (dr.Read())
{
var value = dr.GetValue(1);//column 1 from the result contains the column names
if (columnName.Equals(value))
{
dr.Close();
return true;
}
}
dr.Close();
return false;
}
Further, I noticed that there are 2 CREATE TABLE
statements for the stop_time table in RebuildDb
. Not entirely sure what the effect of this is going to be, but I'm pretty sure it would just go ahead and create the table with the old structure (first CREATE TABLE
command) and ignore the new structure.
Translations.txt is an extension containing the translations of (mainly) stops. The format is:
trans_id,lang,translation
Ingelmunster,fr,Ingelmunster
Ingelmunster,nl,Ingelmunster
Ingelmunster,de,Ingelmunster
Ingelmunster,en,Ingelmunster
Having support for this would greatly benefit itinero-transit
Like this one: https://github.com/philvessey/NextDepartures (Data access and logic are tied together which would require to decouple first to check the logic itself)
Hi,
When creating an export of shapes there is an optional field called shape_dist_traveled, this field does not allow exponential notation, however by default this library does use exponential notation when the value is rather small (default behaviour of .Net)
Can this be changed so that it does never use exponentation notation?
Hint : the location where this occurs is in GTFSWriter.cs in method Write(IGTFSTargetFile file, IEnumerable entities) on the following line :
data[4] = this.WriteFieldDouble("shapes", "shape_dist_traveled", entity.DistanceTravelled);
hereunder is the warning stated that you will get when exporting data with this library and validating it with the feed validator from google :
Invalid numeric value 9.49307305463114E-05. Please ensure that the number includes an explicit whole number portion (ie. use 0.5 instead of .5), that you do not use the exponential notation (ie. use 0.001 instead of 1E-3), and that it is a properly formated decimal value.
in line 7 of shapes.txt
As shown below, Color & TextColor properties should be of type string to hold values like "FFFFFF" instead of nullable int.
The specification also uses "FFFFFF" as an example value for route color.
The data file I am testing against are downloaded from MTA web site (Long Island Railroad).
Hi There,
I'm trying to parse a feed which utilises quotes ("") in addition to commas in every file. Would you have an example of how I could configure the reader to discard the quotes?
All files seem to be parsing except the calendar file. Here is a sample of what it looks like:
service_id,monday,tuesday,wednesday,thursday,friday,saturday,sunday,start_date,end_date
FULLW,1,1,1,1,1,1,1,20160714,20161014
WE,0,0,0,0,0,1,1,20160714,20161014
"Z1+1","1","1","1","1","1","1","1","20160714","20161014"
The first two lines parse as they are from the test file. The last line which is from the feed does not and I get an error message saying:
"Could not parse value "20161014" in field end_date in file calendar.".
I'm pretty sure, there is a configuration item I'm missing. This also makes me wonder about the rest of the data. Can you please help?
Regards,
Udhay
The GTFS exceptions in namespace GTFS.Exceptions
all derive from the System.Exception
class.
In order to log all issues that occur while reading a GTFS feed, one now needs to either catch the base class System.Exception
or catch all exceptions from namespace GTFS.Exceptions
explicitly.
Catching System.Exception
is considered bad practice. The second approach is also not ideal because if a new GTFS exception is added, it will pass through.
IMHO, it is best to add an abstract base class (e.g. class GTFSExceptionBase
) for all GTFS exceptions in namespace GTFS.Exceptions
.
Ugly workaround for now:
try
{
gtfsData.GtfsFeed = reader.Read(gtfsDirectorySource);
}
catch (Exception ex)
{
var type = ex.GetType();
if (!type.FullName.StartsWith("GTFS.Exceptions"))
{
throw;
}
// TODO: add logging here
}
GTFSWriter writes the route color and route text color fields with a leading '#' ( such as #FFFFFF). This is flagged as an error in Google's FeedValidator. It looks like the GTFS spec requires the color to be 6 characters only, without the '#'.
Here is the line where it explicitly writes out the '#'. Was this intentional for some reason, or just an over site?
https://github.com/OsmSharp/GTFS/blob/master/GTFS/GTFSWriter.cs#L796
The default TimeOfDayReader function on GTFSReader throws an exception on empty string arrive_time and depart_time values, but this is actually a valid value for stop times that are supposed to be interpolated by the consumer, as per the GTFS spec (non-timepoint stop times).
In my application I worked around this by setting GTFSReader.TimeOfDayReader to a custom function that uses a negative TimeOfDay value to signify null, but the default implementation should probably handle this as it is valid GTFS. The problem is that ArrivalTime and DepartureTime are non-nullable structs on StopTime, so it would be a breaking change to the api to make them nullable.
Trying to load the files from ftp://gtfs.mot.gov.il. And keep on getting the exception while loading the stop_times.txt file
Have really enjoyed using this library and have also forked it on my github and made a bunch of changes. Have you thought about porting to net core/standard? We are using it in a .net core project and although I don't foresee any issues right now, it would be nice to have it available. Let me know.
Reference here: https://developers.google.com/transit/gtfs/reference#field_definitions
I started working on this. Will open PR when ready/pushable.
REMEMBER: update read/write logic, hash codes, equal operators
protected virtual Route ParseRoute(T feed, GTFSSourceFileHeader header, string[] data)
{
// check required fields.
this.CheckRequiredField(header, header.Name, this.RouteMap, "route_id");
this.CheckRequiredField(header, header.Name, this.RouteMap, "route_short_name");
this.CheckRequiredField(header, header.Name, this.RouteMap, "route_long_name");
this.CheckRequiredField(header, header.Name, this.RouteMap, "route_desc"); //<------ This one
this.CheckRequiredField(header, header.Name, this.RouteMap, "route_type");
See here: https://developers.google.com/transit/gtfs/reference?hl=en#routes_fields
Add the option to put one or more feeds in an SQLite DB. This adds the following features:
The SQLite DB can be used in-memory or file based and should also work on Android/iOS.
Add some general code to cleanup fields (like remove quotes, trim, etc.).
I am keen to use the latest. I notice that you have since introduced support for extended transport types: https://support.google.com/transitpartners/answer/3520902?hl=en which is great. It would be cool to pull that down through Nuget.
I'd like to use this library in one of my projects and I noticed that the master branch (which I supposed was the stable version of the library) was 4 years behind the develop branch (which is the default branch).
Thank you very much.
I had an idea today when trying to work in-memory with GTFS that it can make sense to do look ups, as I have seen in the code. The look ups however are likely (and I am yet to test an approach here) inefficient as they use LINQ First() method. I think putting things in a dictionary might make alot of sense here, at least for finding items by Id. Currently my app takes 17 minutes to run, so I am seeing if I can bring that down a bit. I will let you know if I find any key information here.
Add a custom read files override to better support better customization of the GTFS reader.
Issue parsing calendar where there was whitespace after the end field (file downloaded from GCRTA attached
google_transit.zip
StopTimeParser doesn't handle arrival/departure times >= 100:00:00
Example data for stop_times.txt
trip_id,stop_id,arrival_time,departure_time,stop_sequence,stop_headsign,pickup_type,drop_off_type,shape_dist_traveled
NRI:VehicleJourney:000127-0001689-,NSR:Quay:99528,100:15:00,100:30:00,18,,,,
NRI:VehicleJourney:000127-0001689-,NSR:Quay:99534,102:45:00,103:45:00,19,,,,
NRI:VehicleJourney:000127-0001689-,NSR:Quay:99535,107:00:00,107:30:00,20,,,,
NRI:VehicleJourney:000127-0001689-,NSR:Quay:99533,110:15:00,114:30:00,21,,,,
NRI:VehicleJourney:000127-0001689-,NSR:Quay:99540,118:30:00,118:45:00,22,,,,
NRI:VehicleJourney:000127-0001689-,NSR:Quay:99509,122:00:00,122:15:00,23,,,,
NRI:VehicleJourney:000127-0001689-,NSR:Quay:99515,125:15:00,126:00:00,24,,,,
NRI:VehicleJourney:000127-0001689-,NSR:Quay:99511,128:45:00,129:15:00,25,,,,
NRI:VehicleJourney:000127-0001689-,NSR:Quay:99512,131:15:00,134:45:00,26,,,,
NRI:VehicleJourney:000127-0001689-,NSR:Quay:99522,137:00:00,137:15:00,27,,,,
NRI:VehicleJourney:000127-0001689-,NSR:Quay:99524,139:15:00,139:30:00,28,,,,
NRI:VehicleJourney:000127-0001689-,NSR:Quay:99523,142:00:00,142:15:00,29,,,,
NRI:VehicleJourney:000127-0001689-,NSR:Quay:99518,143:45:00,144:15:00,30,,,,
NRI:VehicleJourney:000127-0001689-,NSR:Quay:99514,147:15:00,147:30:00,31,,,,
NRI:VehicleJourney:000127-0001689-,NSR:Quay:99513,150:45:00,151:15:00,32,,,,
NRI:VehicleJourney:000127-0001689-,NSR:Quay:99519,153:00:00,153:00:00,33,,1,,
Copy/paste error in comment: Creates a parsing exception.
Should be: Creates a missing file exception.
Missing 'not' in exception message.
Should be: "Could not find required file {0}."
/// <summary>
/// Creates a parsing exception.
/// </summary>
/// <param name="name"></param>
public GTFSRequiredFileMissingException(string name)
: base(string.Format("Could find required file {0}.", name))
{
this.Name = name;
}
Add option to make reader behave less strict about GTFS-spec.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.