Comments (4)
Hi!
Parquet4s treats timestamps as INT96 (stored as a binary), because Timestamp is date (Int) + time in nanos (Long). Datetime64 is less precise. However, it doesn't mean that Parquet4s shouldn't be able to read such data.
Can you share a link to a sample file with datetime64 column, including BC & AD, with millis and without. It will help to implement the improvement faster :)
Thanks
from parquet4s.
INT96 has been deprecated (https://issues.apache.org/jira/browse/PARQUET-323), so many tools are now using INT64 to represent timestamps.
I ran into the same issue because my parquet file was generated by PyArrow, but was able to workaround it by forcing the INT96 type during parquet file generation.
from parquet4s.
That is a fair point with INT64. Parquet4s should support it as a timestamp format both in reads and writes (as an option). Spark, Hive and Impala highly influenced the library. Even now, they all use INT96 for timestamps, at least by default.
I am going to prioritise the work on this issue. @Yanikovic @mbykovskyy If you do not want to wait, you can write a custom decoder typeclass and transform LongValue
to a timestamp.
from parquet4s.
Out of the box typeclasses are released with https://github.com/mjakubowski84/parquet4s/releases/tag/v2.7.0
from parquet4s.
Related Issues (20)
- Reading from gcs bucket HOT 1
- Do not publish a pekko/akko versions of scapapb module HOT 1
- missing tail records of large(~193M) parquet files HOT 4
- Protobuf enums deserialisation HOT 3
- compatible parquet-hadoop with spark3.1 HOT 3
- Unsure how to use for 'semiauto' approach HOT 2
- ParquetSchemaResolver test fails on recent JVMs HOT 1
- [akka/pekko] Too many paths created during record partitioning HOT 2
- [RFC] Refactor timestamp codecs HOT 2
- Feature request: Expose partitions as a `Stream[F, Stream[F, Record]]` for FS2 HOT 5
- Incorrect value after reading parquet HOT 7
- [Question] get a listing of parquet files? HOT 4
- [Question] Is there a mechanism to detect when the `rotatingWriter` finishes writing to a file and to be notified of the file that was written? HOT 1
- Support vectored io introduced in Parquet 1.14 HOT 1
- Efficent way to read big files? HOT 2
- Possiblity to write avro IndexedRecords to Parquet using ParquetStreams HOT 4
- feat(akkaPekko): Retry mechanism for ParquetPartitioningFlow HOT 4
- Partitions with nested directories return zero rows HOT 1
- partitioning incompatibility with spark HOT 3
- ProjectionSchema self-inconsistency with partitioned source HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from parquet4s.