hackpartners / darwinpush Goto Github PK

View Code? Open in Web Editor NEW

0.0 7.0 0.0 172 KB

This library is based on fasteroute/darwinpush, but we're planning to move fast and break things.

License: Apache License 2.0

Python 100.00%

darwinpush's Introduction

This library is based on fasteroute/darwinpush, but we're planning to move fast and break things.

Our goals are to build a strong base for our bigdatadarwin project.

It is a work in progress.

darwinpush's People

Contributors

Watchers

darwinpush's Issues

Parse snapshots from FTP

Same as #3, but deal with the daily snapshots instead of the 5-min interval logs.

Parse XML logs from FTP

The messages sent over STOMP are updates, and if some updates are missing the data does not reflect the reality.

Adding a way to parse the darwin logs from FPT given a start timestamp would help address this issue.

The messages taken from the logs need to be sent over to the listener implemented by the client (user of this library) just as the STOMP messages, and the order should be as they were sent from Darwin. This makes all the complicated logic of going to FTP, downloading XMLs, and then connecting to STOMP transparent to the client.

Maybe an extra, optional argument on the on_*_message methods to indicate the source would be nice.

Determine whether logs or a snapshot is required

We aim to attack all the pain points so I purpose a simple interface for handling logs and snapshots.

They happen quite transparently, but this should also be configurable at client/user level.

Darwin keeps messages on queues for 5 minutes after you disconnect. Thus if the library is given a parameter, downtime, it would be easy to decide whether to download logs or a snapshot for XML.

Values for downtime:

<=0     - first start of the application, download snapshot
>0      - seconds of downtime, based on which the library decides how many logs to download, or maybe no logs (if less than 5 minutes)
None    - do not download anything over FTP, just connect to STOMP

Parse "ts" timestamp as the creation date of the Darwin message

pPort.ts is the creation time of the Darwin message. It's useful to keep track of this to avoid applying older "updates" or duplicated messages. Also to keep a history of when events happened in the future.

Also find where this is documented, if anywhere, as I only found out about it from forum messages on
https://groups.google.com/forum/#!topic/openraildata-talk/5RUyLuzXwic which is a bit vague.

Graceful shutdown bug

Sometines when you shutdown the server it all goes nice and smooth, but sometimes it doesn't.

There are three processes running: main stomp client, listener and parser.

The data flows like this: stomp client -> parser -> listener.

the parser has a queue from which it takes one message at a time, parses it, and puts it into the listener queue.

The queues block when they are empty to wait for a message, and that's the problem. Nothing signals the empty queue to unblock when the close signal was sent, so the clean disconnect only happens when there are unprocessed messages.

When there are unprocessed messages, the queue does not block and the quit variable gets read.

A simple solution would be a dummy/None message sent over to the queues when the processes are about to quit to avoid the deadlock.

Add trainId to Schedule object

The trainId attribute in the XML is missing from the objects, or at least is seems to be missing.

XML example:

<Journey rid="201509241060735" uid="Y57914" trainId="5S20" ssd="2015-09-24" toc="GR" trainCat="EE" isPassengerSvc="false">
  <OPOR tpl="ABRDEEN" act="TB" plat="5" wtd="21:25" />
  <OPIP tpl="ABRD27" act="OP" wta="21:30" wtd="21:40" />
  <OPDT tpl="ABRDCH" act="TF" wta="21:45" />
</Journey>

hackpartners / darwinpush Goto Github PK

darwinpush's Introduction

darwinpush's People

Contributors

Watchers

darwinpush's Issues

Parse snapshots from FTP

Parse XML logs from FTP

Determine whether logs or a snapshot is required

Parse "ts" timestamp as the creation date of the Darwin message

Graceful shutdown bug

Add trainId to Schedule object

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent