Comments (5)
take
from arrow-datafusion.
...while looking into this I noticed, that there are no statistics written for an Interval
, which is also described here.
@alamb I guess we can't extract any statistics here? And writing any tests that check we have no statistics written, does not seem to be very helpful?
from arrow-datafusion.
@alamb I guess we can't extract any statistics here? And writing any tests that check we have no statistics written, does not seem to be very helpful?
I actually think these would be helpful then as soon as there are statistics we can hook them up to the tests. If you had time to write the tests that would be great. We can then perhaps file a ticket in parquet-rs for supporting writing statistics to interval types.
from arrow-datafusion.
sure I can do that; from the top of my mind - the fn run()
from the struct Test
panics if we can't extract any statistics, which is the case here. So I'd prepare as much as possible (creating record batches, adding a Scenario, writing those tests) but for now would assert should panic
- does this make any sense to you @alamb?
from arrow-datafusion.
I did some digging in order to find out why / or where the writing of those statistics is not supported (yet).
Since I'm not familiar with the parquet impl, here are my findings, which might be useful in a follow-up ticket in arrow-rs.
- When trying to
fn write_slice()
the min, max values are never updated due to a filter-condition; that checks if the type is INTERVAL - In order to support updating the min max values, we need to handle the comparison of INTERVAL here
I think this should be possible, or put differently, I don't see the reason yet, why this is not supported?
Somethin similar (comparing FixedLenByteArrays) is already done for DECIMAL here?
Perhaps, you have some more information on this @alamb - otherwise this might be enough information to file a ticket in arrow-rs?
from arrow-datafusion.
Related Issues (20)
- Regression in `first_value` and `last_value` coercion HOT 3
- [Epic] Extract catalog functionality from the core to make it more modular HOT 8
- Circular relationship when determining state fields for AggregateUDF HOT 5
- Support join filter in NestedLoopJoin in fizz join test cases HOT 1
- `Int64` as default type for `make_array` function empty or null case
- `array_slice` panicked when called with empty args HOT 2
- `cli_quick_test` failing on windows (stack overflow) after sqlparser `0.47.0` upgrade
- Implement `ScalarValue::IntervalMonthDayNano` -> String Support
- Implement `ScalarValue::TimestampNanosecond` -> String Support HOT 1
- Implement `ScalarValue::TimestampMillisecond` -> String Support HOT 3
- Support convert LogicalPlan::EmptyRelation to SQL String HOT 1
- Improve overflow errors HOT 1
- Efficiently and correctly Extract Page Index statistics into `ArrayRef`s HOT 8
- Add ability to receive an iterator over the inputs of a LogicalPlan instead of a Vec. HOT 10
- Support `array_any_value`
- Projects require unique expressions names error in substrait producer/consumer HOT 7
- Substrait consumer doesn't respect final output column names HOT 1
- `extract` doesn't accept quoted field names HOT 2
- Convert `stddev` to udaf HOT 4
- x NOT IN y works but NOT (x IN y) doesn't
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from arrow-datafusion.