Hey! I have generated a simple pandas dataframe that has a schema li

INT96 has been deprecated (<a href="https://issues.apache.org/jira/browse/PARQUET-323"

Out of the box typeclasses are released with <a href="https://github.com/mjakubowski84

Decoding of timestamp fields generated by pandas/pyarrow about parquet4s HOT 4 CLOSED

Yanikovic commented on September 25, 2024

Decoding of timestamp fields generated by pandas/pyarrow

from parquet4s.

Comments (4)

mjakubowski84 commented on September 25, 2024

Hi!

Parquet4s treats timestamps as INT96 (stored as a binary), because Timestamp is date (Int) + time in nanos (Long). Datetime64 is less precise. However, it doesn't mean that Parquet4s shouldn't be able to read such data.

Can you share a link to a sample file with datetime64 column, including BC & AD, with millis and without. It will help to implement the improvement faster :)

Thanks

from parquet4s.

mbykovskyy commented on September 25, 2024

INT96 has been deprecated (https://issues.apache.org/jira/browse/PARQUET-323), so many tools are now using INT64 to represent timestamps.

I ran into the same issue because my parquet file was generated by PyArrow, but was able to workaround it by forcing the INT96 type during parquet file generation.

from parquet4s.

mjakubowski84 commented on September 25, 2024

That is a fair point with INT64. Parquet4s should support it as a timestamp format both in reads and writes (as an option). Spark, Hive and Impala highly influenced the library. Even now, they all use INT96 for timestamps, at least by default.

I am going to prioritise the work on this issue. @Yanikovic @mbykovskyy If you do not want to wait, you can write a custom decoder typeclass and transform LongValue to a timestamp.

from parquet4s.

mjakubowski84 commented on September 25, 2024

Out of the box typeclasses are released with https://github.com/mjakubowski84/parquet4s/releases/tag/v2.7.0

from parquet4s.

Recommend Projects

Decoding of timestamp fields generated by pandas/pyarrow about parquet4s HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent