Code Monkey home page Code Monkey logo

Comments (14)

JanVanHaaren avatar JanVanHaaren commented on August 24, 2024 1

That suggestion sounds good to me. The Wyscout V3 deserializer fills the timestamp field based on the minute and second fields too although it would probably be better to use the provided matchTimestamp field. The StatsBomb deserializer uses the provided timestamp.

Should we explicitly store a sequence number for each event as well? StatsBomb and Wyscout explicitly provide a sequence number in the index and eventIndex fields, respectively.

from kloppy.

probberechts avatar probberechts commented on August 24, 2024 1

Should we explicitly store a sequence number for each event as well? StatsBomb and Wyscout explicitly provide a sequence number in the index and eventIndex fields, respectively.

I would just make sure that the records in a dataset are chronologically ordered. Storing a sequence number then does not provide any added value since you would be able to infer it from the position in the list of records.

from kloppy.

koenvo avatar koenvo commented on August 24, 2024

Are there any details on how to properly sort on correctly and maintain millisecond precision?

A solution could be to extract timestamp from “min” and “sec” attributes but than we lose the precision.

from kloppy.

probberechts avatar probberechts commented on August 24, 2024

My documentation doesn't mention the precision of the "timestamp" field. However, my version of the documentation is extremely outdated. Maybe @JanVanHaaren has something more up-to-date.

I find it strange that the "timestamp" field does not align with the "min" and "sec" fields. If the precision of the "timestamp" field would be inferior to the "min" and "sec" fields, I don't see why we would infer an (incorrect) millisecond precision from it.

from kloppy.

probberechts avatar probberechts commented on August 24, 2024

Looking at a few more timestamps, I now realize that Opta does not add leading zeros to the milliseconds. So, "2018-08-20T21:32:27.98" is actually "2018-08-20T21:32:27.098000".

Python's %f pads zeros to the right, while we should pad zeros to the left to parse the Opta timestamp. We should simply adapt the timestamp parser and then it should work.

%f is an extension to the set of format characters in the C standard (but implemented separately in datetime objects, and therefore always available). When used with the strptime() method, the %f directive accepts from one to six digits and zero pads on the right.

from kloppy.

JanVanHaaren avatar JanVanHaaren commented on August 24, 2024

The min and sec fields on one hand and the timestamp field on the other hand provide different pieces of information about an event. The min and sec fields provide the game time in minutes and seconds when the event occurred, whereas the timestamp field provides the date and time when the event was logged in UK time. Hence, the timestamp field can be used as a tie-breaker to order events but not to derive the time when the event occurred in the match.

Documentation Opta F24

  • timestamp - "The UK time/date at which this event was initially entered into Opta’s database"
  • min - "Minute of the event"
  • sec - "Second of the event"

Documentation Stats Perform MA3

  • timestamp - "The UK time/date at which this event was initially entered into Opta's database"
  • timeMin - "Game time in minutes"
  • timeSec - "Game time in seconds"

from kloppy.

probberechts avatar probberechts commented on August 24, 2024

So, to conclude, would it be okay to fill the "timestamp" field in Kloppy with min + sec and order events based on min + sec + timestamp?

from kloppy.

koenvo avatar koenvo commented on August 24, 2024

Small question about the timestamp vs min/sec: when the record is not altered afterwards, does the timestamp match the min/sec?
so only when the record is altered the timestamp loses value, correct?

from kloppy.

JanVanHaaren avatar JanVanHaaren commented on August 24, 2024

Small question about the timestamp vs min/sec: when the record is not altered afterwards, does the timestamp match the min/sec? so only when the record is altered the timestamp loses value, correct?

My understanding is that the timestamp field is never updated. The timestamp field reflects the time when the event was initially entered in the database and the last_modified field reflects the time when the event was last updated in the database.

I suspect that the timestamp field is reasonably accurate for events that are recorded live. However, not all event data is recorded live and events can occasionally be inserted at a later time during the match or even after the match.

from kloppy.

probberechts avatar probberechts commented on August 24, 2024

Although, according to my old documentation the timestamp field reflects the time that the event occured within the match. 😕

image

from kloppy.

JanVanHaaren avatar JanVanHaaren commented on August 24, 2024

I will contact the Stats Perform support desk. The official documentation is confusing.

Documentation website

  • timestamp - "The UK time/date at which this event was initially entered into Opta's database"
  • timestamp_utc - "The UTC timestamp of when the event occurred, or when the data was entered in Opta DB"
  • last_modified - "The UK time/date at which this event was last modified by Opta"

from kloppy.

JanVanHaaren avatar JanVanHaaren commented on August 24, 2024

I haven't heard back yet from Stats Perform, but I think I finally understand how the timestamps work. I suspect the meaning of the timestamp field depends on the coverage level. The event timestamps are detailed to the millisecond for some but not all coverage levels.

For example, the event data for this friendly match between Salzburg and Ried has coverage level 14. The game took place on 12 October 2023, but the timestamp for the kick-off event is 2023-10-15T08:49:39.373Z.

{
	"id": "9130ocq9mdrosrd4mv7a666tw",
	"coverageLevel": "14",
	"date": "2023-10-12Z",
	"time": "12:00:00Z",
	"localDate": "2023-10-12",
	"localTime": "14:00:00",
	"numberOfPeriods": 2,
	"periodLength": 45,
	"overtimeLength": 15,
	"lastUpdated": "2023-11-25T12:46:38Z",
	"description": "Salzburg vs Ried",
	...
},
{
	"id": 2604454267,
	"eventId": 3,
	"typeId": 1,
	"periodId": 1,
	"timeMin": 0,
	"timeSec": 0,
	"contestantId": "do3l4dhs0ooog6se728jxc06z",
	"playerId": "3rmiekqhf431q783nhdc2m12h",
	"playerName": "W. Eza",
	"outcome": 1,
	"x": 49.8,
	"y": 50.0,
	"timeStamp": "2023-10-15T08:49:39.373Z",
	"lastModified": "2023-10-16T00:39:15Z",
	"qualifier": [
		...
	]
},

from kloppy.

probberechts avatar probberechts commented on August 24, 2024

The question is rather whether they can be used as a reliable way to measure the relative time that has passed since the "period start" event.

from kloppy.

JanVanHaaren avatar JanVanHaaren commented on August 24, 2024

I don't know yet, but my feeling is that it should be possible for the highest coverage levels. I'll investigate a few more matches. Unfortunately, I don't have access to much event data that was collected at lower coverage levels.

from kloppy.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.