linkedconnections / gtfs2lc Goto Github PK
View Code? Open in Web Editor NEWGTFS to Linked Connections converter
Home Page: http://linkedconnections.org
License: MIT License
GTFS to Linked Connections converter
Home Page: http://linkedconnections.org
License: MIT License
3.2.0
to 3.3.0
.This version is covered by your current version range and after updating it in your project the build failed.
fast-csv is a direct dependency of this project, and it is very likely causing it to break. If other packages depend on yours, this update is probably also breaking those in turn.
There is a collection of frequently asked questions. If those don’t help, you can always ask the humans behind Greenkeeper.
Your Greenkeeper Bot 🌴
In order to achieve a faster translation, we could split the connections.txt file in parts and launch multiple gtfs2lc.js workers, depending on the number of cores of the machine.
If both are unavailable, we just discard this connection right now. However, when the Real-Time version is than ran, it will not be able to match a RT update with this connection. We should keep it in there after all.
If a lot of calendar entries are present in a GTFS file for the same trip id, the script will crash. This is caused in calendar.js:88
.
RangeError: Maximum call stack size exceeded
at RegExp.test (<anonymous>)
at expandFormat (/home/bert/Desktop/linked-connections-server/node_modules/moment/moment.js:627:48)
at configFromStringAndFormat (/home/bert/Desktop/linked-connections-server/node_modules/moment/moment.js:2407:18)
at prepareConfig (/home/bert/Desktop/linked-connections-server/node_modules/moment/moment.js:2575:13)
at createFromConfig (/home/bert/Desktop/linked-connections-server/node_modules/moment/moment.js:2544:44)
at createLocalOrUTC (/home/bert/Desktop/linked-connections-server/node_modules/moment/moment.js:2631:16)
at createLocal (/home/bert/Desktop/linked-connections-server/node_modules/moment/moment.js:2635:16)
at hooks (/home/bert/Desktop/linked-connections-server/node_modules/moment/moment.js:12:29)
at StreamIterator.next (/home/bert/Desktop/linked-connections-server/node_modules/gtfs2lc/lib/StreamIterator.js:34:56)
at CalendarToServices._processCalendarDates (/home/bert/Desktop/linked-connections-server/node_modules/gtfs2lc/lib/services/calendar.js:86:33)
at /home/bert/Desktop/linked-connections-server/node_modules/gtfs2lc/lib/services/calendar.js:88:14
at StreamIterator.next (/home/bert/Desktop/linked-connections-server/node_modules/gtfs2lc/lib/StreamIterator.js:36:5)
at CalendarToServices._processCalendarDates (/home/bert/Desktop/linked-connections-server/node_modules/gtfs2lc/lib/services/calendar.js:86:33)
at /home/bert/Desktop/linked-connections-server/node_modules/gtfs2lc/lib/services/calendar.js:88:14
at StreamIterator.next (/home/bert/Desktop/linked-connections-server/node_modules/gtfs2lc/lib/StreamIterator.js:36:5)
at CalendarToServices._processCalendarDates (/home/bert/Desktop/linked-connections-server/node_modules/gtfs2lc/lib/services/calendar.js:86:33)
Example GTFS file (Too big for git): https://filehost.net/89f4172762918be7
I am aware that this GTFS file does not follow GTFS best practices (calendars.txt is not used, but instead calendar_dates is used), but if mass adoption is to follow, it might be best to support this.
We are starting to talk about using Itinero-transit commercially, but not having platform numbers is a bit of a deal breaker for that.
I know this is a bit of an issue, so I wanted to log this here and (re)start the effort to get the data.
calendar.txt is required according to the GTFS reference, yet not all GTFS feeds actually give a calendar.txt. We should handle the case when there's no calendar.txt available.
GTFS block_id
s are local identifiers that might change from one version to the next. By exposing the information of their associated trips we enable the creation of stable URIs via the URI template configuration.
Connection’s headsign is incorrect according to the spec.
4.0.3
to 4.1.0
.This version is covered by your current version range and after updating it in your project the build failed.
fast-csv is a direct dependency of this project, and it is very likely causing it to break. If other packages depend on yours, this update is probably also breaking those in turn.
The new version differs by 4 commits.
682710d
v4.1.0
b9dd314
Merge pull request #327 from C2FO/v4.1.0-rc
22e4fb7
Added benchmarks for files of 1000 and 10000
c0d8f72
Added headers
event #321
See the full diff
There is a collection of frequently asked questions. If those don’t help, you can always ask the humans behind Greenkeeper.
Your Greenkeeper Bot 🌴
How are we going to give a unique and persistent identifier to the connections, even when the data gets erased and added again?
First of all, for a real-world connection published by us, I suggest using http://id.linkedconnections.org/{feedid}/{version}/{localid}
.
The localid
is then composed out of:
gtfs:trip
local identifierYYYY-MM-DD
string for when the trip is executedMind that this URI strategy is specific to gtfs2lc. A different URI strategy can be chosen for different systems.
I have created a separate repository for redirecting (303) GET requests to URIs of pages containing this contection
A trip is executed with a certain mode. E.g., Bus, Tram, Gondola, etc.
How can we map this into gtfs2lc?
Wait for #60
Make sure we can add -s and -e to the command line parameters with a startDate and endDate for what we want to transform. Both parameters should be optional
When trying to convert the attached GTFS file, GTFS2LC crashes
3|datasets | Indexing services and routes succesful!
3|datasets | Error: Unhandled "error" event. (Did not find this trip id in trips.txt: 55408414)
3|datasets | at ConnectionsBuilder.emit (events.js:186:19)
3|datasets | at ConnectionsBuilder.onerror (_stream_readable.js:663:12)
3|datasets | at emitOne (events.js:116:13)
3|datasets | at ConnectionsBuilder.emit (events.js:211:7)
3|datasets | at onwriteError (_stream_writable.js:417:12)
3|datasets | at onwrite (_stream_writable.js:439:5)
3|datasets | at ConnectionsBuilder.afterTransform (_stream_transform.js:90:3)
3|datasets | at _expandTrip.then.catch (/var/www/dk.lc.bertmarcelis.be/node_modules/gtfs2lc/lib/ConnectionsBuilder.js:79:5)
3|datasets | at <anonymous>
3|datasets | at process._tickCallback (internal/process/next_tick.js:189:7)
My guess: FeedValidator makes no problem of unused trips. Only trips which are in trips.txt should be used, not all trips in stop_times.txt.
Removing references to this trip from stop_times resolves the issue.
routes.txt still doesn’t get UTF-16 artefacts removed
GTFS to test with: https://www.data.gouv.fr/en/datasets/offre-de-transport-rbus-de-la-c-a-de-rochefort-ocean/
Fix by adding routes.txt to gtfs2lc-sort.
Thanks @l-vincent-l for showing this bug!
After installing with "npm install" the modules csv, level, n3, q and unzip, I started the execution of gtfs2lc at 10:35 am: ./gtfs-csv2connections path/to/data/transitEMT.zip > path/to/data/emtConnections.ttl
Let me know if you want to access the EMT GTFS data.
After completing some steps, at 4:15 pm it crashed with the following message:
Draining Agencies
Transforming Calendar
Transforming CalendarDates
Transforming Frequencies
Transforming Routes
Draining Shapes and Shape Segments
Draining Stops
Transforming Stop Times
Transforming Trips
Transforming GTFS store to arrival/departures
FATAL ERROR: JS Allocation failed - process out of memory
Aborted (core dumped)
The system crash message is titled "nodejs crashed with SIGABRT in v8::Function::Call()". A crash report was created at /var/crash/ (~90MB).
The following folders were created at the execution path: arrivals, dates, departures, and stop_times. They all contain .ldb documents and a LOG, among others. Let me know if you need to see any of the logs.
I am using Ubuntu 14.04.1 LTS "Trusty Tahr", running on a Toshiba Portégé with Intel CORE i7 and 16GB RAM.
Different options to implement the identifier strategy for e.g., keeping a block ID persistent:
Using just local identifiers for e.g., a block id will give 2 problems:
Suggestion solution to introduce a global identifier
https://example.org/blocks/{block_id}
Solved the problem with federating over different source, but not yet the problem of making it work when an updated GTFS file gets translated to LC (unless for your GTFS feed, block ids are incremental over time and you can rely on this).
So we need to scope it to the specific GTFS feed and this brings us to another problem: how do you identify this specific GTFS feed or the fact it got translated to RDF here.
Suggestion for a version number:
feed_version
in feed_info.txt
-- Design issue: don’t include patch version so that a block id stays the same when the minor and major version number didn’t change? (e.g., 1.2.0 → 1.2.1)gtfs2lc
URI template for e.g., block then becomes:
https://example.org/blocks/{feed_version}/{block_id}
When converting to jsonld (or mongold), our process takes a lot longer due to a jsonld compacting taking place, after raw triples are brought together in a json object.
We could however do the conversation to jsonld a lot faster by doing just converting the json objects that come out of the transformer instead of using the jsonld-stream library.
When modifying the trip rule object to add its start time we need to make sure a new trip rule object gets created for each iteration of this for loop to avoid wrongful mutation when doing fast stream reading.
Right now, example.org is used in every RDF output. This should be changed to something configurable
I suggest -b --baseUris <baseUri> : a mapping file with base URIs for RDF outputs
Add a way to import the data into mongodb
\n
, \n\r
De Lijn uses them randomly. Just make sure that when sorting, we also make sure the endline is only a \n
Make sure that if a leveldb already exists in the directory, it's thrown away before starting
moment.js has performance issues (date-fns/date-fns#275 (comment))
We should move to date-fns for our dates and work on top of the native Date JS object
https://github.com/date-fns/date-fns
@julianrojas87 Also relevant for GTFS-RT2LC and the LC Server
I tried to convert the 2021-02-12 VBB GTFS feed.
npm init --yes
npm i gtfs2lc -D
wget -r --no-parent --no-directories -P gtfs -N 'https://vbb-gtfs.jannisr.de/2021-02-12/'
# rename all .csv to #.txt …
env NODE_ENV=production gtfs2lc gtfs -f jsonld | head -n 3
# GTFS to linked connections converter use --help to discover more functions
# Indexing of stops, services, routes and trips completed successfully!
# Created worker thread (PID 1)
# Created worker thread (PID 2)
# Created worker thread (PID 3)
# Created worker thread (PID 4)
# [Error: ENOENT: no such file or directory, open 'gtfs/connections_0.txt'] {
# errno: -2,
# code: 'ENOENT',
# syscall: 'open',
# path: 'gtfs/connections_0.txt'
# }
# Error: Worker stopped with exit code 1
# at Worker.<anonymous> (/Users/j/playground/vbb-gtfs-lc/node_modules/gtfs2lc/lib/gtfs2connections.js:145:27)
# at Worker.emit (node:events:378:20)
# at Worker.[kOnExit] (node:internal/worker:260:10)
# at Worker.<computed>.onexit (node:internal/worker:187:20)
# [Error: ENOENT: no such file or directory, open 'gtfs/connections_1.txt'] {
# errno: -2,
# code: 'ENOENT',
# syscall: 'open',
# path: 'gtfs/connections_1.txt'
# }
# Error: Worker stopped with exit code 1
# at Worker.<anonymous> (/Users/j/playground/vbb-gtfs-lc/node_modules/gtfs2lc/lib/gtfs2connections.js:145:27)
# at Worker.emit (node:events:378:20)
# at Worker.[kOnExit] (node:internal/worker:260:10)
# at Worker.<computed>.onexit (node:internal/worker:187:20)
# [Error: ENOENT: no such file or directory, open 'gtfs/connections_2.txt'] {
# errno: -2,
# code: 'ENOENT',
# syscall: 'open',
# path: 'gtfs/connections_2.txt'
# }
# Error: Worker stopped with exit code 1
# at Worker.<anonymous> (/Users/j/playground/vbb-gtfs-lc/node_modules/gtfs2lc/lib/gtfs2connections.js:145:27)
# at Worker.emit (node:events:378:20)
# at Worker.[kOnExit] (node:internal/worker:260:10)
# at Worker.<computed>.onexit (node:internal/worker:187:20)
# [Error: ENOENT: no such file or directory, open 'gtfs/connections_3.txt'] {
# errno: -2,
# code: 'ENOENT',
# syscall: 'open',
# path: 'gtfs/connections_3.txt'
# }
# Error: Worker stopped with exit code 1
# at Worker.<anonymous> (/Users/j/playground/vbb-gtfs-lc/node_modules/gtfs2lc/lib/gtfs2connections.js:145:27)
# at Worker.emit (node:events:378:20)
# at Worker.[kOnExit] (node:internal/worker:260:10)
# at Worker.<computed>.onexit (node:internal/worker:187:20)
It has also created 4 files inside gtfs
:
ls -l gtfs
# -rw-r--r--@ 1 j staff 3537 Feb 22 01:10 agency.txt
# -rw-r--r-- 1 j staff 79382 Feb 22 01:14 calendar.txt
# -rw-r--r-- 1 j staff 859354 Feb 22 01:14 calendar_dates.txt
# -rw-r--r--@ 1 j staff 64 Feb 22 01:10 frequencies.txt
# -rw-r--r--@ 1 j staff 140 Feb 22 01:10 pathways.txt
# -rw-r--r-- 1 j staff 0 Feb 22 01:23 raw_0.json
# -rw-r--r-- 1 j staff 0 Feb 22 01:23 raw_1.json
# -rw-r--r-- 1 j staff 0 Feb 22 01:23 raw_2.json
# -rw-r--r-- 1 j staff 0 Feb 22 01:23 raw_3.json
# -rw-r--r--@ 1 j staff 48812 Feb 22 01:10 routes.txt
# -rw-r--r--@ 1 j staff 143590907 Feb 22 01:10 shapes.txt
# -rw-r--r-- 1 j staff 269753688 Feb 22 01:14 stop_times.txt
# -rw-r--r--@ 1 j staff 4723089 Feb 22 01:10 stops.txt
# -rw-r--r--@ 1 j staff 4200935 Feb 22 01:10 transfers.txt
# -rw-r--r--@ 1 j staff 14019736 Feb 22 01:10 trips.txt
I stumbled upon this because, when building gtfs-utils
, I identifed this as a bug: GTFS Time values are defined relative to "12 hours before noon", so the implementation in this repo seems to fail during DST switches.
gtfs2lc/lib/ConnectionsBuilder.js
Lines 15 to 17 in cb0bdac
related: google/transit#15
We need to map the minimum transfer times to Linked Connections as well. Not sure how to model that. Any ideas?
Basically we want to express that if you’re at a gtfs:Station, you can transfer from its gtfs:Stops onlly if you take into account a minimum transfer time of X seconds.
The path is always needed. Thus don't require the -p option any longer
We need to process frequencies as well...
This means reading an extra file if it exists, and adding extra connections based on the connectionRules stream
Support:
When installed globally gtfs2lc is copied to /bin/ but then the file stoptimes2connections.js cannot be found anymore as it is not in $CURDIR/stoptimes2connections.js
Is valid GTFS to define stop_times
with both empty arrival_time
and departure_time
. However we cannot have Connections with empty departure or arrival times.
According to the spec, this implies that the consumer needs to interpolate these stop times, which is difficult in this case given that Connections are created in a streaming way.
A possible fix could be to do this as part of the pre-processing step that already takes place to order stop_times
. Reusing an existing tool that handles this scenario, might make things easier.
Make sure that timezone is configurable or readable from the feed info
I'm trying to run gtfs2lc in a macOS but some problems appear:
dchaves$ gtfs2lc-sort metro/
Converting newlines dos2unix
Removing UTF-8 artifacts in directory metro/
Sorting files in directory metro/
dchaves$ gtfs2lc metro/
GTFS to linked connections converter use --help to discover more functions
The same dataset in an Ubuntu dist works perfectly.
Make another script which is exposed through the gtfs2lc command which ensures the ordering of the extracted GTFS files.
After thinking about this with @smazzoleni, we need to have any flexibility for the baseURIs to be generated with any kind of javascript functionality instead of only uri templates.
The solution we favored was to make the baseURIs.json config file a JS file with methods instead. Every kind of GTFS file will need to extend this class into their own system.
One such class could actually be an implementation where a config file can be taken account (for backwards compatibility) with URI templates, and possibly link it as follows for slightly more functionality (idea by @smazzoleni):
{
"stop": "http://data.gtfs.org/example/stops/{stop_id}",
"route": "http://data.gtfs.org/example/routes/{route_short_id}",
"trip": "http://data.gtfs.org/example/trips/{trip_id}/{trip_startTime}",
"connection": "http://example/linkedconnections.org/connections/{trip_startTime}/{departureStop}/{trip_id}",
"resolve": {
"route_short_id": "connection.trip.route.route_id.substring(0,5)",
"trip_id": "connection.trip.trip_id",
"trip_startTime": "format(connection.trip.startTime, 'YYYYMMDDTHHMM');",
"departureStop": "connection.departureStop"
}
}
Does this expect miliseconds instead of ISO8601?
The NMBS operates several trains which are split into two indepent trains at some point in their journey. The reverse situation also occurs, with two trains merging into one. Splitting or merging always takes place in a station where traveller can (dis)embark the train.
First of all, the train drives the first part of its journey as a whole, during which it is identified by a single identifier, in this case IC4310. When the train splits, one part of the train keeps the identifier (IC4310) for the remaining of its journey. The other part gets a new identifier, in this case (IC4410). Even though it is clear to see that this is one train, even the NMBS website indicates that there are 2 trips. This causes routeplanning to think a transfer is needed, even though travellers can remain seated if they are in the right part of the train. NMBS' own routeplanning takes this into account, but we need a way to determine this for 3rd party routeplanning. NMBS does not publish data on which carriages travel where.
IC4310 in trips.txt:
220,000095,88____:007::8821006:8832409:13:1102:20180625,Hamont,4310,,5481,,1
IC4410 in trips.txt:
225,000297,88____:007::8832409:8831005:9:1152:20180625,Hasselt,4410,,5658,,1
Merging trains are similar, and have different identifiers until they merge. Once they merge, one of the trains will continue to travel with the identifier of one of said two trains. It should be noted that the train which loses its identifier on merging doesn't seem to have platform information. Again, routeplanning will consider this as a transer, even though passengers can remain seated (an thus don't have to transfer). NMBS' own routeplanning takes this into account, but we need a way to determine this for 3rd party routeplanning.
A possible solution for this would be to label the common parts of a journey with both trip ids, by publishing two connection objects for each connection, both objects being identical except for the trip/route id.
Another possible solution would be to create an index of splitting trains, where each row contains two identifiers which belong together in one splitting or merging train. This would be less 'intrusive' to the Linked Connections list, and would prevent certain edge cases (splitting Linked Connections into fragments of a certain filesize could cause two "identical" connections to be split over multiple pages, having to combine and check multiple connection objects would be more complicated compared to only checking a list to determine whether a transfer is 'real' or a split/merge during routeplanning).
Current status
This data is not published in GTFS. There doesn't seem to be any field which implies a train split, or that a train drove part of its journey attached to another train.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.