Code Monkey home page Code Monkey logo

gtfs's People

Contributors

cuevaskoch avatar derrickcrowne avatar ericpanorel avatar hypervtechnics avatar mmsatari avatar peetjvv avatar philvessey avatar pkillick avatar thzinc avatar xivk avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

gtfs's Issues

Parsing is slow

Parsing a feed from txt files is slower than it could be. Look at how this can be improved.

GTFSReader<GTFSFeed>().Read doesn't Dispose() FileStream ?

Hello xivk,

I may be wrong but according to my investigation GTFSReader().Read doesn't Dispose() FileStream.

Here is an unit test to reproduce the issue :

using System;
using System.IO;
using GTFS.IO;
using NUnit.Framework;

namespace GTFS.Test
{
    [TestFixture]
    internal class FileNotDisposedTest
    {
        [Test]
        public void TestFileDisposed()
        {
            ReadAndParseGtfsSampleData();

            try
            {
                var fileWriter = File.OpenWrite(@"sample-feed/agency.txt");
            }
            catch (System.IO.IOException accessDeniedException)
            {
                /*
                 * System.IO.IOException : The process cannot access the file 'sample-feed\agency.txt' 
                 * because it is being used by another process.
                 */
                throw;
            }
        }

        private void ReadAndParseGtfsSampleData()
        {
            try
            {
                const string inputFolderPath = @"sample-feed";
                var directoryInfo = new DirectoryInfo(inputFolderPath);
                var gtfsDirectorySource = new GTFSDirectorySource(directoryInfo);
                var reader = new GTFSReader<GTFSFeed>();
                reader.Read(gtfsDirectorySource);
            }
            catch (Exception ex)
            {
                var type = ex.GetType();
                if (!type.FullName.StartsWith("GTFS.Exceptions"))
                {
                    throw;
                }
            }
        }
    }
}

If the issue is real and not a miss-usage from my part, this could be fix by adding a file.Dispose() at the end of GTFSReader.Read(IGTFSSourceFile file, T feed, EntityParseDelegate parser, EntityAddDelegate addDelegate) (but I didn't test it yet)

Route description is optional

According to the code route_desc is required, but according to the specifications it is not:

this.CheckRequiredField(header, header.Name, this.RouteMap, "route_desc");

see:

        protected virtual Route ParseRoute(T feed, GTFSSourceFileHeader header, string[] data)
        {
            // check required fields.
            this.CheckRequiredField(header, header.Name, this.RouteMap, "route_id");

            this.CheckRequiredField(header, header.Name, this.RouteMap, "route_short_name");
            this.CheckRequiredField(header, header.Name, this.RouteMap, "route_long_name");
            this.CheckRequiredField(header, header.Name, this.RouteMap, "route_desc");
            this.CheckRequiredField(header, header.Name, this.RouteMap, "route_type");

            // parse/set all fields.
            Route route = new Route();
            for (int idx = 0; idx < data.Length; idx++)
            {
                this.ParseRouteField(feed, header, route, header.GetColumn(idx), data[idx]);
            }
            return route;
        }

Make adding shapes optional

Make adding shapes optional if there is already a shapes file with content the add-shapes command should optionally do nothing.

Agency_id is required if multiple agencies in agency.txt

According to the GTFS reference, the agency_id field is optional, but in the details is specified that agency_id is required if more than one agency is provided:

The agency_id field is an ID that uniquely identifies a transit agency. A transit feed may represent data from more than one agency. The agency_id is dataset unique. This field is optional for transit feeds that only contain data for a single agency.

Incorrect/misleading implementation for GetShapes(tripId) method. [Non blocking issue]

    /// <summary>
    /// Returns all the shapes for the given trip id.
    /// </summary>
    /// <param name="tripId"></param>
    /// <returns></returns>
    public IEnumerable<Shape> GetShapes(string tripId)
    {
        return _shapes.Where(x => x.Id == tripId);
    }

Should instead be something more inline with the following (depending on the case you want to address):

    /// <summary>
    /// Returns all the shapes for the given trip id.
    /// </summary>
    /// <param name="tripId"></param>
    /// <returns></returns>
    public IOrderedEnumerable<Shape> GetShapes(string tripId)
    {
          var shapeId = _trips.First(x => x.Id == tripId);

          return _shapes.Select(t => t.ShapeId).OrderBy(s => s.SequenceId);
    }

The problem with your one is that you can't query the shapes entity with a tripId. You can get the shapeid from a trip. Then get the many shapes back with that (non unique) shape id in the shapes list/table.

Sequences rule checking does not respect the GTFS rules

Currently, the source code checks the stop sequences (per travel) are incremented by one.

Whereas the GTFS specification precises that:
"The stop_sequence field identifies the order of the stops for a particular trip. The values for stop_sequence must be non-negative integers, and they must increase along the trip.
For example, the first stop on the trip could have a stop_sequence of 1, the second stop on the trip could have a stop_sequence of 23, the third stop could have a stop_sequence of 40, and so on."

Note also, it seems the zero value can be a valid value.

Here a suggested patch inside GTFS\GTFS\Validation\GTFSFeedValidation.cs:

        /* check all sequences */
        foreach (var stopTimesPair in stopTimesIndex)
        {
            uint current=0, previous;
            foreach(var stopTime in stopTimesPair.Value)
            {
                previous = current;
                current = stopTime.StopSequence;
                if (previous != 0)
                {
                    if (previous >= current)
                    { 
                        messages = string.Format("Stop sequences values shall increase and be unic in stop_times file for trip id {0}.", stopTimesPair.Key);
                        return false;
                    }
                }                    
            }
        }

Minor bugs in GTFSFeed.SetFeedInfo

The GTFSFeed.SetFeedInfo method should set the version from the argument, not its _feedInfo
i.e.: this._feedInfo.Version = feedInfo.Version;

In addition, the GTFSEntity.Tag property is not copied from the feedInfo argument to the _feedInfo instance field.

Calendar.txt mandatory or not?

According to the GTFS specifications, the calendar.txt is required, but then again it is not ๐Ÿ˜•
see GTFS reference:

calendar.txt - Required - Dates for service IDs using a weekly schedule. Specify when service starts and ends, as well as days of the week where service is available.

calendar_dates.txt - Optional - Exceptions for the service IDs defined in the calendar.txt file. If calendar_dates.txt includes ALL dates of service, this file may be specified instead of calendar.txt.

At this location I have found a sample feed where the calendar.txt is missing, which because of the GTFSReader.GetRequiredFiles method will fail to be read.

So, the question is: should the GTFS library relax the requirement that calendar.txt is required and instead check if one of calendar.txt or calendar_dates.txt is provided?

WinRT implementation

Are GTFS packages available for WinRT applications ? (Windows 8.1, Windows Phone 8.1). Classes like DirectoryInfo are not available on this platform ...

Route agency_id is optional

Class GTFS.Entities.Route (incorrectly) defines AgencyId as [Required] but the GTFS reference claims it is optional. In addition, the GTFsFeedValidation class checks the existence of the agency_id value in the list of known agencies. Obviously, if AgencyId is null, the check will fail.

            // check routes.
            var routeIds = new HashSet<string>();
            foreach(var route in feed.Routes)
            {
                if (routeIds.Contains(route.Id))
                { // oeps, duplicate id.
                    messages = string.Format("Duplicate route id found: {0}", route.Id);
                    return false;
                }
                routeIds.Add(route.Id);
                if (!agencyIds.Contains(route.AgencyId))
                {// oeps, unknown id.
                    messages = string.Format("Unknown agency found in route {0}: {1}", route.Id, route.AgencyId);
                    return false;
                }
            }

backwards compatibility issue for timepoint in stop_times

I haven't tested it to see what would happen, but there is a backward compatibility issue when adding more columns to existing tables in the way that you did it. I would suggest adding this line to the end of SQLiteGTFSFeedDb.RebuildDb():

if (!ColumnExists("stop_time", "timepoint")) this.ExecuteNonQuery("ALTER TABLE [stop_time] ADD [timepoint] TEXT;");

this uses the new method in RebuildDb():

    /// <summary>
    /// Checks if the given table contains a column with the given name.
    /// </summary>
    /// <param name="tableName">The table in this database to check.</param>
    /// <param name="columnName">The column in the given table to look for.</param>
    /// <returns>True if the given table contains a column with the given name.</returns>
    private bool ColumnExists(string tableName, string columnName)
    {
        var cmd = new SQLiteCommand("PRAGMA table_info(" + tableName + ")", _connection);
        var dr = cmd.ExecuteReader();
        while (dr.Read())
        {
            var value = dr.GetValue(1);//column 1 from the result contains the column names
            if (columnName.Equals(value))
            {
                dr.Close();
                return true;
            }
        }

        dr.Close();
        return false;
    }

Further, I noticed that there are 2 CREATE TABLE statements for the stop_time table in RebuildDb. Not entirely sure what the effect of this is going to be, but I'm pretty sure it would just go ahead and create the table with the old structure (first CREATE TABLE command) and ignore the new structure.

Support translations.txt

Translations.txt is an extension containing the translations of (mainly) stops. The format is:

trans_id,lang,translation
Ingelmunster,fr,Ingelmunster
Ingelmunster,nl,Ingelmunster
Ingelmunster,de,Ingelmunster
Ingelmunster,en,Ingelmunster

Having support for this would greatly benefit itinero-transit

distances in export are written in exponential notation instead of regular notation

Hi,

When creating an export of shapes there is an optional field called shape_dist_traveled, this field does not allow exponential notation, however by default this library does use exponential notation when the value is rather small (default behaviour of .Net)

Can this be changed so that it does never use exponentation notation?
Hint : the location where this occurs is in GTFSWriter.cs in method Write(IGTFSTargetFile file, IEnumerable entities) on the following line :
data[4] = this.WriteFieldDouble("shapes", "shape_dist_traveled", entity.DistanceTravelled);

hereunder is the warning stated that you will get when exporting data with this library and validating it with the feed validator from google :

Invalid numeric value 9.49307305463114E-05. Please ensure that the number includes an explicit whole number portion (ie. use 0.5 instead of .5), that you do not use the exponential notation (ie. use 0.001 instead of 1E-3), and that it is a properly formated decimal value.
in line 7 of shapes.txt

Parsing a feed with quotes...

Hi There,

I'm trying to parse a feed which utilises quotes ("") in addition to commas in every file. Would you have an example of how I could configure the reader to discard the quotes?

All files seem to be parsing except the calendar file. Here is a sample of what it looks like:

service_id,monday,tuesday,wednesday,thursday,friday,saturday,sunday,start_date,end_date
FULLW,1,1,1,1,1,1,1,20160714,20161014
WE,0,0,0,0,0,1,1,20160714,20161014
"Z1+1","1","1","1","1","1","1","1","20160714","20161014"

The first two lines parse as they are from the test file. The last line which is from the feed does not and I get an error message saying:
"Could not parse value "20161014" in field end_date in file calendar.".

I'm pretty sure, there is a configuration item I'm missing. This also makes me wonder about the rest of the data. Can you please help?

Regards,
Udhay

Missing base class for all GTFS exceptions

The GTFS exceptions in namespace GTFS.Exceptions all derive from the System.Exception class.

In order to log all issues that occur while reading a GTFS feed, one now needs to either catch the base class System.Exception or catch all exceptions from namespace GTFS.Exceptions explicitly.

Catching System.Exception is considered bad practice. The second approach is also not ideal because if a new GTFS exception is added, it will pass through.

IMHO, it is best to add an abstract base class (e.g. class GTFSExceptionBase) for all GTFS exceptions in namespace GTFS.Exceptions.

Ugly workaround for now:

        try
        {
            gtfsData.GtfsFeed = reader.Read(gtfsDirectorySource);
        }
        catch (Exception ex)
        {
            var type = ex.GetType();
            if (!type.FullName.StartsWith("GTFS.Exceptions"))
            {
                throw;
            }

            // TODO: add logging here
        }

GTFSReader can't handle Stop Times with empty Arrive and Depart times

The default TimeOfDayReader function on GTFSReader throws an exception on empty string arrive_time and depart_time values, but this is actually a valid value for stop times that are supposed to be interpolated by the consumer, as per the GTFS spec (non-timepoint stop times).

In my application I worked around this by setting GTFSReader.TimeOfDayReader to a custom function that uses a negative TimeOfDay value to signify null, but the default implementation should probably handle this as it is valid GTFS. The problem is that ArrivalTime and DepartureTime are non-nullable structs on StopTime, so it would be a breaking change to the api to make them nullable.

System.OutOfMemoryException

Trying to load the files from ftp://gtfs.mot.gov.il. And keep on getting the exception while loading the stop_times.txt file

Any plans to port to .net core/standard?

Have really enjoyed using this library and have also forked it on my github and made a bunch of changes. Have you thought about porting to net core/standard? We are using it in a .net core project and although I don't foresee any issues right now, it would be nice to have it available. Let me know.

Update to latest GTFS reference

Reference here: https://developers.google.com/transit/gtfs/reference#field_definitions

I started working on this. Will open PR when ready/pushable.

Changes to be made

  • Everywhere
    • Update xml documentation from reference
    • Check filename attributes
    • Enums fixed numeric values
  • stops.txt
    • Location type
    • Wheelchair boarding
    • Parent station
  • routes.txt
    • Route sort order
  • trips.txt
    • Bikes allowed
  • routes.txt
    • Time point type nullable (?)
  • fare_attributes.txt
    • Transfers as enum (?)
    • Duration a uintn (?)
    • Agency id
    • Price as double (?)
  • frequencies.txt
    • Headway secs as uint
    • Start and end time as DateTime
    • Exact times as enum
  • transfers.txt
    • Minimum transfer type as uint?
  • feed_info.txt
    • Contact email and url
    • Implement parsing (#56)
  • levels.txt and pathways.txt
    • Add to model?

REMEMBER: update read/write logic, hash codes, equal operators

Entities/Files to be checked

  • agency.txt
  • stops.txt
  • routes.txt
  • trips.txt
  • stop_times.txt
  • calendar.txt
  • calendar_dates.txt
  • fare_attributes.txt
  • fare_rules.txt
  • shapes.txt
  • frequencies.txt
  • transfers.txt
  • pathways.txt
  • levels.txt
  • feed_info.txt
  • Are the file requirements respected (if possible depending on strict mode)?

In Routes.txt the route_desc should be optional but it appears manditory.

    protected virtual Route ParseRoute(T feed, GTFSSourceFileHeader header, string[] data)
    {
        // check required fields.
        this.CheckRequiredField(header, header.Name, this.RouteMap, "route_id");

        this.CheckRequiredField(header, header.Name, this.RouteMap, "route_short_name");
        this.CheckRequiredField(header, header.Name, this.RouteMap, "route_long_name");
        this.CheckRequiredField(header, header.Name, this.RouteMap, "route_desc"); //<------ This one
        this.CheckRequiredField(header, header.Name, this.RouteMap, "route_type");

See here: https://developers.google.com/transit/gtfs/reference?hl=en#routes_fields

Add option to put feed(s) into an SQLite DB

Add the option to put one or more feeds in an SQLite DB. This adds the following features:

  • More advanced querying.
  • Store already parsed feeds in a file DB.
  • Combine feeds.

The SQLite DB can be used in-memory or file based and should also work on Android/iOS.

Why is the master branch four years behind?

I'd like to use this library in one of my projects and I noticed that the master branch (which I supposed was the stable version of the library) was 4 years behind the develop branch (which is the default branch).

  1. I'm wondering which one can I use and why master is not being updated?
  2. What's the relationship between the Nuget packages and the branches.

Thank you very much.

Optimization idea for faster lookups

I had an idea today when trying to work in-memory with GTFS that it can make sense to do look ups, as I have seen in the code. The look ups however are likely (and I am yet to test an approach here) inefficient as they use LINQ First() method. I think putting things in a dictionary might make alot of sense here, at least for finding items by Id. Currently my app takes 17 minutes to run, so I am seeing if I can bring that down a bit. I will let you know if I find any key information here.

Parsing of stop_time doesn't handle arrival/departure time greater than 100 hours

StopTimeParser doesn't handle arrival/departure times >= 100:00:00

Example data for stop_times.txt
trip_id,stop_id,arrival_time,departure_time,stop_sequence,stop_headsign,pickup_type,drop_off_type,shape_dist_traveled
NRI:VehicleJourney:000127-0001689-,NSR:Quay:99528,100:15:00,100:30:00,18,,,,
NRI:VehicleJourney:000127-0001689-,NSR:Quay:99534,102:45:00,103:45:00,19,,,,
NRI:VehicleJourney:000127-0001689-,NSR:Quay:99535,107:00:00,107:30:00,20,,,,
NRI:VehicleJourney:000127-0001689-,NSR:Quay:99533,110:15:00,114:30:00,21,,,,
NRI:VehicleJourney:000127-0001689-,NSR:Quay:99540,118:30:00,118:45:00,22,,,,
NRI:VehicleJourney:000127-0001689-,NSR:Quay:99509,122:00:00,122:15:00,23,,,,
NRI:VehicleJourney:000127-0001689-,NSR:Quay:99515,125:15:00,126:00:00,24,,,,
NRI:VehicleJourney:000127-0001689-,NSR:Quay:99511,128:45:00,129:15:00,25,,,,
NRI:VehicleJourney:000127-0001689-,NSR:Quay:99512,131:15:00,134:45:00,26,,,,
NRI:VehicleJourney:000127-0001689-,NSR:Quay:99522,137:00:00,137:15:00,27,,,,
NRI:VehicleJourney:000127-0001689-,NSR:Quay:99524,139:15:00,139:30:00,28,,,,
NRI:VehicleJourney:000127-0001689-,NSR:Quay:99523,142:00:00,142:15:00,29,,,,
NRI:VehicleJourney:000127-0001689-,NSR:Quay:99518,143:45:00,144:15:00,30,,,,
NRI:VehicleJourney:000127-0001689-,NSR:Quay:99514,147:15:00,147:30:00,31,,,,
NRI:VehicleJourney:000127-0001689-,NSR:Quay:99513,150:45:00,151:15:00,32,,,,
NRI:VehicleJourney:000127-0001689-,NSR:Quay:99519,153:00:00,153:00:00,33,,1,,

Typo's in GTFSRequiredFileMissingException constructor

Copy/paste error in comment: Creates a parsing exception.
Should be: Creates a missing file exception.

Missing 'not' in exception message.
Should be: "Could not find required file {0}."

    /// <summary>
    /// Creates a parsing exception.
    /// </summary>
    /// <param name="name"></param>
    public GTFSRequiredFileMissingException(string name)
        : base(string.Format("Could find required file {0}.", name))
    {
        this.Name = name;
    }

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.