Comments (2)
related #9354
from arrow-datafusion.
I think we could add this as a user defined table function in datafusion-cli quite easily
For example we could follow the model of parquet_metadata
(which also follows duckdb):
SELECT path_in_schema, row_group_id, row_group_num_rows, stats_min, stats_max, total_compressed_size
FROM parquet_metadata('hits.parquet')
WHERE path_in_schema = '"WatchID"'
LIMIT 3;
+----------------+--------------+--------------------+---------------------+---------------------+-----------------------+
| path_in_schema | row_group_id | row_group_num_rows | stats_min | stats_max | total_compressed_size |
+----------------+--------------+--------------------+---------------------+---------------------+-----------------------+
| "WatchID" | 0 | 450560 | 4611687214012840539 | 9223369186199968220 | 3883759 |
| "WatchID" | 1 | 612174 | 4611689135232456464 | 9223371478009085789 | 5176803 |
| "WatchID" | 2 | 344064 | 4611692774829951781 | 9223363791697310021 | 3031680 |
+----------------+--------------+--------------------+---------------------+---------------------+-----------------------+
3 rows in set. Query took 0.053 seconds.
It is documented here https://arrow.apache.org/datafusion/user-guide/cli.html
The code for it is here; https://github.com/apache/arrow-datafusion/blob/637293580db0634a4efbd3f52e4700992ee3080d/datafusion-cli/src/functions.rs#L215-L442
from arrow-datafusion.
Related Issues (20)
- INSERT INTO SQL failing on CSV-backed table HOT 3
- Unify SQL planning for `ORDER BY`, `HAVING`, `DISTINCT`, etc
- Unable to perform lead/lag built in functions on List and Struct data types HOT 1
- Enable `split_file_groups_by_statistics` by default HOT 3
- Avoid inlining non deterministic CTE HOT 2
- Make all SchemaProvider trait APIs async HOT 4
- Document timezone semantics HOT 2
- Schema incorrect after select over aggregate function that returns a different type than the input HOT 5
- clippy failure in main HOT 1
- Document Sort Merge Join algorithm HOT 4
- `LogFunc` simplifier swaps the order of arguments
- Standardize the separator in name HOT 1
- Onyl recompute schema in `TypeCoercion` when necessary
- Better timezone functionalities HOT 3
- Auto-update mechanism for dataframe test HOT 1
- Remove `Expr::GetIndexedField` and `GetFieldAccess` and always use function `get_field` for indexing HOT 6
- Support user defined display for UDF HOT 2
- Remove DataPtr trait and use Arc::ptr_eq directly
- Sort Merge Join. LeftSemi issues
- Sort Merge Join. LeftAnti issues
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from arrow-datafusion.