Comments (2)
How naive of me: as always, time-related machinery is like a rabbit hole 😅
Ambiguity
Given
- Date Time: 2019-01-01T00:00
- Time Zone: Africa/Nairobi
Current LocalDateTime-based encoder "output"
- Julian day: 2458485
- Nanos: -10800000000000
Suggested Instant-based encoder "output"
- Julian day: 2458484
- Nanos: 75600000000000
Both these outputs denote the same point in time. Instant
-based decoder correctly restores value produced by LocalDateTime
-based encoder and vice versa. But this is still an observable behavior change. I suppose this is undesirable (or not?). Anyway, I have no idea which of the outputs is correct/preferable because of the next issue.
INT96 format itself
The INT96 format is deprecated (see here) and even lacks any documentation (see discussion here).
With all that said, wouldn't it be better to soft-deprecate default timestamp codecs in parquet4s and encourage users to choose INT64-based ones? Also, we could switch to an Instant
internally and use the LocalDateTime
-based implementation only as a fallback for the INT96 format.
from parquet4s.
I suppose this is undesirable (or not?)
I guess there must be a small bug that's causing it. Definitely, it is undesirable because produces different data.
The INT96 format is deprecated (see here) and even lacks any documentation (see discussion apache/parquet-format#49).
Yes, it is deprecated, but yet (!) it is still a default format in such priminent tools as Spark, Impala and many others.
from parquet4s.
Related Issues (20)
- Options `ParquetFileWriter.Mode.OVERWRITE` not deleting old parquet files in S3 HOT 1
- failed to read parquet generated by pandas HOT 5
- Add Pekko support HOT 4
- Is it possible to write a file without Akka or Fs2 integration? HOT 2
- `ParquetReader.projectedGeneric` does not work when selecting more than one column from a same group HOT 5
- Reading from gcs bucket HOT 1
- Do not publish a pekko/akko versions of scapapb module HOT 1
- missing tail records of large(~193M) parquet files HOT 4
- Protobuf enums deserialisation HOT 3
- compatible parquet-hadoop with spark3.1 HOT 3
- Unsure how to use for 'semiauto' approach HOT 2
- ParquetSchemaResolver test fails on recent JVMs HOT 1
- [akka/pekko] Too many paths created during record partitioning HOT 2
- Feature request: Expose partitions as a `Stream[F, Stream[F, Record]]` for FS2 HOT 5
- Incorrect value after reading parquet HOT 7
- [Question] get a listing of parquet files? HOT 4
- [Question] Is there a mechanism to detect when the `rotatingWriter` finishes writing to a file and to be notified of the file that was written? HOT 1
- Support vectored io introduced in Parquet 1.14 HOT 1
- Efficent way to read big files? HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from parquet4s.