Code Monkey home page Code Monkey logo

parquetviewer's Introduction

ParquetViewer

Simple Windows desktop application for viewing & querying Apache Parquet files.

Main UI

Summary

ParquetViewer is a utility to quickly view Apache Parquet files on Windows desktop machines.

If you'd like to add any new features feel free to send a pull request.

Some key features:

  • View parquet file metadata
  • Run simple sql queries on parquet data
  • Open single or partitioned files

Download

Releases can be found here: https://github.com/mukunku/ParquetViewer/releases

Details on how to use the utility can be found in the Wiki

Analytics

Users can opt-in to share anonymous usage data to help make the app better. 1

Checkout the ParquetViewer Analytics Dashboard if you're interested!

Technical Details

The latest version of this project was written in C# using Microsoft Visual Studio Community 2022 v17.8.3 and .NET 7

Acknowledgements

This utility would not be possible without: https://github.com/aloneguid/parquet-dotnet

Footnotes

  1. Full privacy policy here: https://github.com/mukunku/ParquetViewer/wiki/Privacy-Policy โ†ฉ

parquetviewer's People

Contributors

brentbundang avatar dannysummerlin avatar felipepessoto avatar lotsahelp avatar mortenf avatar mukunku avatar sujeendran avatar swimmingtiger avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

parquetviewer's Issues

[BUG] Exception repeatedly thrown when widening window to include a column containing arrays of bytes

Parquet Viewer Version
Version 2.6.0 and later. Works just fine on earlier versions.

Where was the parquet file created?
Locally using C#

Sample File
Contains PHI. I can make a sanitized one if you think it is needed. Let me know if you have trouble reproducing with a parquet file containing a byte[] column.

Describe the bug
When the window is widened to the column in question, the application repeatedly throws the exception in the attachment

Screenshots
image

Additional context
I suspect the app is having a problem with a column of type byte[]

Note: This tool relies on the parquet-dotnet library for all the actual Parquet processing. So any issues where that library cannot process a parquet file will not be addressed by us. Please open a ticket on that library's repo to address such issues.

Severity Code Description Project File Line Suppression State Error The expression "[System.Version]::Parse('')" cannot be evaluated. Version string portion was too short or too long.

Hello, I'm getting below error while building project. Please suggest what to do in this issue.

**Severity Code Description Project File Line Suppression State
Error The expression "[System.Version]::Parse('')" cannot be evaluated. Version string portion was too short or too long. ParquetFileViewer

Severity Code Description Project File Line Suppression State
Error CS0012 The type 'Object' is defined in an assembly that is not referenced. You must add a reference to assembly 'netstandard, Version=2.0.0.0, Culture=neutral, PublicKeyToken=cc7b13ffcd2ddd51'. ParquetFileViewer C:\Users\ssharma1\ParquetViewer\src\ParquetFileViewer\UtilityMethods.cs 16 Active**

Capture

Select all fields by default, when opening a file

When opening a parquet file, ParquetViewer first launches a popup "Select fields to load", where you either can confirm to load all fields, or select the fields you want.

In all use cases relevant for me, I want to display all fields. Hence I'm wondering if it would be possible to skip this popup all together? It's just inconvenient to always confirm the "All fields...", before you see any data.

Maybe skipping this popup can become a configuration option somewhere?

[FEAT] View per-column metadata

Describe the feature you'd like to be added to Parquet Viewer

I would like the metadata viewer to show the custom key_value_metadata added to each column of the schema. PyArrow's API seems to allow this to be added at the schema level, while Parquet.Net's API adds it per row group, which is more inline with the actual file structure.

Share why this feature would be a good addition to the utility

I want to validate that I'm building Parquet files correctly with the data I expect. I would like to use metadata for per-column information like units and description.

[BUG] Close button is not at the top right

Parquet Viewer Version
v2.3.6

Describe the bug
The window's close button is not exactly at the top right of the page.
Clicking exactly at the right close windows behind.

Screenshots
Untitled

Additional context
You must be exactly at the very very top right.
It must be only a 1px gap.

[FEAT] Support for the ZSTD compression format

Describe the feature you'd like to be added to Parquet Viewer
Support for the ZSTD compression format (and possibly other additional compression formats).

Share why this feature would be a good addition to the utility
I have some Parquet files compressed with ZSTD, I get the error in the screenshot below,

Screenshots
image

[FEAT] Add metadata viewer

Would it be possible to add an element to view the custom file metadata?

This would be useful for accessing additional fields that may be stored as metadata in the file.

[BUG] Opening ParquetViewer to an empty view

Parquet Viewer Version
What version of Parquet Viewer are you experiencing the issue with?
2.3.6.22567

Where was the parquet file created?
Apache Spark, Hive, Java, C#, pyarrow, etc.
Python through pandas with pyarrow backend

Sample File
Upload a sample file so the issue can be debugged!
DLIx12_test.zip

Describe the bug
A clear and concise description of what the bug is.
Opening file in ParquetViewer gives this as output. No idea what is wrong.
Can it be linked to pyarrow version 9? I have no problem opening same sized file for version 8.

image

Screenshots
If applicable, add screenshots to help explain your problem.

Thrift Metadata

{
"Version": 2,
"Num_rows": 3835405,
"Created_by": "parquet-cpp-arrow version 9.0.0",
"Schema": [
{
"Field_id": 0,
"Name": "Transmission_Number",
"Type": "INT64",
"Type_length": 0,
"LogicalType": null,
"Scale": 0,
"Precision": 0,
"Repetition_type": "OPTIONAL",
"Converted_type": "UTF8"
},
{
"Field_id": 0,
"Name": "Line_Number",
"Type": "INT64",
"Type_length": 0,
"LogicalType": null,
"Scale": 0,
"Precision": 0,
"Repetition_type": "OPTIONAL",
"Converted_type": "UTF8"
},
{
"Field_id": 0,
"Name": "DC",
"Type": "BYTE_ARRAY",
"Type_length": 0,
"LogicalType": "LogicalType(STRING: StringType())",
"Scale": 0,
"Precision": 0,
"Repetition_type": "OPTIONAL",
"Converted_type": "UTF8"
},
{
"Field_id": 0,
"Name": "GOLD_Article",
"Type": "INT64",
"Type_length": 0,
"LogicalType": null,
"Scale": 0,
"Precision": 0,
"Repetition_type": "OPTIONAL",
"Converted_type": "UTF8"
},
{
"Field_id": 0,
"Name": "GOLD_Storage_Loc",
"Type": "BYTE_ARRAY",
"Type_length": 0,
"LogicalType": "LogicalType(STRING: StringType())",
"Scale": 0,
"Precision": 0,
"Repetition_type": "OPTIONAL",
"Converted_type": "UTF8"
},
{
"Field_id": 0,
"Name": "GOLD_Logistic_Variant",
"Type": "BYTE_ARRAY",
"Type_length": 0,
"LogicalType": "LogicalType(STRING: StringType())",
"Scale": 0,
"Precision": 0,
"Repetition_type": "OPTIONAL",
"Converted_type": "UTF8"
},
{
"Field_id": 0,
"Name": "SAP_Article",
"Type": "INT64",
"Type_length": 0,
"LogicalType": null,
"Scale": 0,
"Precision": 0,
"Repetition_type": "OPTIONAL",
"Converted_type": "UTF8"
},
{
"Field_id": 0,
"Name": "Delivery_Quantity_Base_Unit",
"Type": "INT64",
"Type_length": 0,
"LogicalType": null,
"Scale": 0,
"Precision": 0,
"Repetition_type": "OPTIONAL",
"Converted_type": "UTF8"
},
{
"Field_id": 0,
"Name": "Base_UOM",
"Type": "DOUBLE",
"Type_length": 0,
"LogicalType": null,
"Scale": 0,
"Precision": 0,
"Repetition_type": "OPTIONAL",
"Converted_type": "UTF8"
},
{
"Field_id": 0,
"Name": "Delivery_Quantity_Preparation_Unit",
"Type": "INT64",
"Type_length": 0,
"LogicalType": null,
"Scale": 0,
"Precision": 0,
"Repetition_type": "OPTIONAL",
"Converted_type": "UTF8"
},
{
"Field_id": 0,
"Name": "Preparation_UOM",
"Type": "DOUBLE",
"Type_length": 0,
"LogicalType": null,
"Scale": 0,
"Precision": 0,
"Repetition_type": "OPTIONAL",
"Converted_type": "UTF8"
},
{
"Field_id": 0,
"Name": "Client",
"Type": "BYTE_ARRAY",
"Type_length": 0,
"LogicalType": "LogicalType(STRING: StringType())",
"Scale": 0,
"Precision": 0,
"Repetition_type": "OPTIONAL",
"Converted_type": "UTF8"
},
{
"Field_id": 0,
"Name": "Shipping_Date",
"Type": "BYTE_ARRAY",
"Type_length": 0,
"LogicalType": "LogicalType(STRING: StringType())",
"Scale": 0,
"Precision": 0,
"Repetition_type": "OPTIONAL",
"Converted_type": "UTF8"
},
{
"Field_id": 0,
"Name": "GOLD_Shipping_Id",
"Type": "INT64",
"Type_length": 0,
"LogicalType": null,
"Scale": 0,
"Precision": 0,
"Repetition_type": "OPTIONAL",
"Converted_type": "UTF8"
},
{
"Field_id": 0,
"Name": "Line_Id",
"Type": "DOUBLE",
"Type_length": 0,
"LogicalType": null,
"Scale": 0,
"Precision": 0,
"Repetition_type": "OPTIONAL",
"Converted_type": "UTF8"
},
{
"Field_id": 0,
"Name": "SAP_OBD_Number",
"Type": "INT64",
"Type_length": 0,
"LogicalType": null,
"Scale": 0,
"Precision": 0,
"Repetition_type": "OPTIONAL",
"Converted_type": "UTF8"
},
{
"Field_id": 0,
"Name": "Article_Managed_by_Unit",
"Type": "BYTE_ARRAY",
"Type_length": 0,
"LogicalType": "LogicalType(STRING: StringType())",
"Scale": 0,
"Precision": 0,
"Repetition_type": "OPTIONAL",
"Converted_type": "UTF8"
},
{
"Field_id": 0,
"Name": "Shipped_Net_Weight",
"Type": "BYTE_ARRAY",
"Type_length": 0,
"LogicalType": "LogicalType(STRING: StringType())",
"Scale": 0,
"Precision": 0,
"Repetition_type": "OPTIONAL",
"Converted_type": "UTF8"
},
{
"Field_id": 0,
"Name": "Missing_Quantity",
"Type": "INT64",
"Type_length": 0,
"LogicalType": null,
"Scale": 0,
"Precision": 0,
"Repetition_type": "OPTIONAL",
"Converted_type": "UTF8"
},
{
"Field_id": 0,
"Name": "Missing_Weight",
"Type": "DOUBLE",
"Type_length": 0,
"LogicalType": null,
"Scale": 0,
"Precision": 0,
"Repetition_type": "OPTIONAL",
"Converted_type": "UTF8"
},
{
"Field_id": 0,
"Name": "Pallet_Number",
"Type": "DOUBLE",
"Type_length": 0,
"LogicalType": null,
"Scale": 0,
"Precision": 0,
"Repetition_type": "OPTIONAL",
"Converted_type": "UTF8"
},
{
"Field_id": 0,
"Name": "Mother_Pallet",
"Type": "DOUBLE",
"Type_length": 0,
"LogicalType": null,
"Scale": 0,
"Precision": 0,
"Repetition_type": "OPTIONAL",
"Converted_type": "UTF8"
},
{
"Field_id": 0,
"Name": "Daughter_Pallet",
"Type": "DOUBLE",
"Type_length": 0,
"LogicalType": null,
"Scale": 0,
"Precision": 0,
"Repetition_type": "OPTIONAL",
"Converted_type": "UTF8"
},
{
"Field_id": 0,
"Name": "Code_Picker",
"Type": "BYTE_ARRAY",
"Type_length": 0,
"LogicalType": "LogicalType(STRING: StringType())",
"Scale": 0,
"Precision": 0,
"Repetition_type": "OPTIONAL",
"Converted_type": "UTF8"
},
{
"Field_id": 0,
"Name": "Code_Loader",
"Type": "BYTE_ARRAY",
"Type_length": 0,
"LogicalType": "LogicalType(STRING: StringType())",
"Scale": 0,
"Precision": 0,
"Repetition_type": "OPTIONAL",
"Converted_type": "UTF8"
},
{
"Field_id": 0,
"Name": "Route",
"Type": "DOUBLE",
"Type_length": 0,
"LogicalType": null,
"Scale": 0,
"Precision": 0,
"Repetition_type": "OPTIONAL",
"Converted_type": "UTF8"
},
{
"Field_id": 0,
"Name": "Tour_Rank",
"Type": "DOUBLE",
"Type_length": 0,
"LogicalType": null,
"Scale": 0,
"Precision": 0,
"Repetition_type": "OPTIONAL",
"Converted_type": "UTF8"
},
{
"Field_id": 0,
"Name": "Promo_Number",
"Type": "DOUBLE",
"Type_length": 0,
"LogicalType": null,
"Scale": 0,
"Precision": 0,
"Repetition_type": "OPTIONAL",
"Converted_type": "UTF8"
},
{
"Field_id": 0,
"Name": "Full_Pallet",
"Type": "BYTE_ARRAY",
"Type_length": 0,
"LogicalType": "LogicalType(STRING: StringType())",
"Scale": 0,
"Precision": 0,
"Repetition_type": "OPTIONAL",
"Converted_type": "UTF8"
},
{
"Field_id": 0,
"Name": "Depot_Origine_Transfer",
"Type": "BYTE_ARRAY",
"Type_length": 0,
"LogicalType": "LogicalType(STRING: StringType())",
"Scale": 0,
"Precision": 0,
"Repetition_type": "OPTIONAL",
"Converted_type": "UTF8"
},
{
"Field_id": 0,
"Name": "Depot_Preparation",
"Type": "BYTE_ARRAY",
"Type_length": 0,
"LogicalType": "LogicalType(STRING: StringType())",
"Scale": 0,
"Precision": 0,
"Repetition_type": "OPTIONAL",
"Converted_type": "UTF8"
},
{
"Field_id": 0,
"Name": "Order_Date",
"Type": "BYTE_ARRAY",
"Type_length": 0,
"LogicalType": "LogicalType(STRING: StringType())",
"Scale": 0,
"Precision": 0,
"Repetition_type": "OPTIONAL",
"Converted_type": "UTF8"
},
{
"Field_id": 0,
"Name": "Empty_Included",
"Type": "BYTE_ARRAY",
"Type_length": 0,
"LogicalType": "LogicalType(STRING: StringType())",
"Scale": 0,
"Precision": 0,
"Repetition_type": "OPTIONAL",
"Converted_type": "UTF8"
},
{
"Field_id": 0,
"Name": "Empty_Article_Number_Pallet",
"Type": "DOUBLE",
"Type_length": 0,
"LogicalType": null,
"Scale": 0,
"Precision": 0,
"Repetition_type": "OPTIONAL",
"Converted_type": "UTF8"
},
{
"Field_id": 0,
"Name": "Empty_Article_Number_CV",
"Type": "DOUBLE",
"Type_length": 0,
"LogicalType": null,
"Scale": 0,
"Precision": 0,
"Repetition_type": "OPTIONAL",
"Converted_type": "UTF8"
},
{
"Field_id": 0,
"Name": "Crate_Number",
"Type": "DOUBLE",
"Type_length": 0,
"LogicalType": null,
"Scale": 0,
"Precision": 0,
"Repetition_type": "OPTIONAL",
"Converted_type": "UTF8"
},
{
"Field_id": 0,
"Name": "Warehouse_number",
"Type": "BYTE_ARRAY",
"Type_length": 0,
"LogicalType": "LogicalType(STRING: StringType())",
"Scale": 0,
"Precision": 0,
"Repetition_type": "OPTIONAL",
"Converted_type": "UTF8"
},
{
"Field_id": 0,
"Name": "Child_OBD_Number",
"Type": "INT64",
"Type_length": 0,
"LogicalType": null,
"Scale": 0,
"Precision": 0,
"Repetition_type": "OPTIONAL",
"Converted_type": "UTF8"
},
{
"Field_id": 0,
"Name": "Missing_Motivation",
"Type": "BYTE_ARRAY",
"Type_length": 0,
"LogicalType": "LogicalType(STRING: StringType())",
"Scale": 0,
"Precision": 0,
"Repetition_type": "OPTIONAL",
"Converted_type": "UTF8"
},
{
"Field_id": 0,
"Name": "Promo_Week",
"Type": "DOUBLE",
"Type_length": 0,
"LogicalType": null,
"Scale": 0,
"Precision": 0,
"Repetition_type": "OPTIONAL",
"Converted_type": "UTF8"
},
{
"Field_id": 0,
"Name": "Delivery_Type",
"Type": "BYTE_ARRAY",
"Type_length": 0,
"LogicalType": "LogicalType(STRING: StringType())",
"Scale": 0,
"Precision": 0,
"Repetition_type": "OPTIONAL",
"Converted_type": "UTF8"
},
{
"Field_id": 0,
"Name": "Source_Name",
"Type": "BYTE_ARRAY",
"Type_length": 0,
"LogicalType": "LogicalType(STRING: StringType())",
"Scale": 0,
"Precision": 0,
"Repetition_type": "OPTIONAL",
"Converted_type": "UTF8"
}
]
}

Arrow: Schema

{
"Fields": {
"Transmission_Number": {
"DataType": {
"TypeId": 9,
"Name": "int64",
"BitWidth": 64,
"IsSigned": true,
"IsFixedWidth": true
},
"Name": "Transmission_Number",
"IsNullable": true,
"HasMetadata": false,
"Metadata": null
},
"Line_Number": {
"DataType": {
"TypeId": 9,
"Name": "int64",
"BitWidth": 64,
"IsSigned": true,
"IsFixedWidth": true
},
"Name": "Line_Number",
"IsNullable": true,
"HasMetadata": false,
"Metadata": null
},
"DC": {
"DataType": {
"TypeId": 13,
"Name": "utf8",
"IsFixedWidth": false
},
"Name": "DC",
"IsNullable": true,
"HasMetadata": false,
"Metadata": null
},
"GOLD_Article": {
"DataType": {
"TypeId": 9,
"Name": "int64",
"BitWidth": 64,
"IsSigned": true,
"IsFixedWidth": true
},
"Name": "GOLD_Article",
"IsNullable": true,
"HasMetadata": false,
"Metadata": null
},
"GOLD_Storage_Loc": {
"DataType": {
"TypeId": 13,
"Name": "utf8",
"IsFixedWidth": false
},
"Name": "GOLD_Storage_Loc",
"IsNullable": true,
"HasMetadata": false,
"Metadata": null
},
"GOLD_Logistic_Variant": {
"DataType": {
"TypeId": 13,
"Name": "utf8",
"IsFixedWidth": false
},
"Name": "GOLD_Logistic_Variant",
"IsNullable": true,
"HasMetadata": false,
"Metadata": null
},
"SAP_Article": {
"DataType": {
"TypeId": 9,
"Name": "int64",
"BitWidth": 64,
"IsSigned": true,
"IsFixedWidth": true
},
"Name": "SAP_Article",
"IsNullable": true,
"HasMetadata": false,
"Metadata": null
},
"Delivery_Quantity_Base_Unit": {
"DataType": {
"TypeId": 9,
"Name": "int64",
"BitWidth": 64,
"IsSigned": true,
"IsFixedWidth": true
},
"Name": "Delivery_Quantity_Base_Unit",
"IsNullable": true,
"HasMetadata": false,
"Metadata": null
},
"Base_UOM": {
"DataType": {
"TypeId": 12,
"Name": "double",
"BitWidth": 64,
"IsSigned": true,
"Precision": 2,
"IsFixedWidth": true
},
"Name": "Base_UOM",
"IsNullable": true,
"HasMetadata": false,
"Metadata": null
},
"Delivery_Quantity_Preparation_Unit": {
"DataType": {
"TypeId": 9,
"Name": "int64",
"BitWidth": 64,
"IsSigned": true,
"IsFixedWidth": true
},
"Name": "Delivery_Quantity_Preparation_Unit",
"IsNullable": true,
"HasMetadata": false,
"Metadata": null
},
"Preparation_UOM": {
"DataType": {
"TypeId": 12,
"Name": "double",
"BitWidth": 64,
"IsSigned": true,
"Precision": 2,
"IsFixedWidth": true
},
"Name": "Preparation_UOM",
"IsNullable": true,
"HasMetadata": false,
"Metadata": null
},
"Client": {
"DataType": {
"TypeId": 13,
"Name": "utf8",
"IsFixedWidth": false
},
"Name": "Client",
"IsNullable": true,
"HasMetadata": false,
"Metadata": null
},
"Shipping_Date": {
"DataType": {
"TypeId": 13,
"Name": "utf8",
"IsFixedWidth": false
},
"Name": "Shipping_Date",
"IsNullable": true,
"HasMetadata": false,
"Metadata": null
},
"GOLD_Shipping_Id": {
"DataType": {
"TypeId": 9,
"Name": "int64",
"BitWidth": 64,
"IsSigned": true,
"IsFixedWidth": true
},
"Name": "GOLD_Shipping_Id",
"IsNullable": true,
"HasMetadata": false,
"Metadata": null
},
"Line_Id": {
"DataType": {
"TypeId": 12,
"Name": "double",
"BitWidth": 64,
"IsSigned": true,
"Precision": 2,
"IsFixedWidth": true
},
"Name": "Line_Id",
"IsNullable": true,
"HasMetadata": false,
"Metadata": null
},
"SAP_OBD_Number": {
"DataType": {
"TypeId": 9,
"Name": "int64",
"BitWidth": 64,
"IsSigned": true,
"IsFixedWidth": true
},
"Name": "SAP_OBD_Number",
"IsNullable": true,
"HasMetadata": false,
"Metadata": null
},
"Article_Managed_by_Unit": {
"DataType": {
"TypeId": 13,
"Name": "utf8",
"IsFixedWidth": false
},
"Name": "Article_Managed_by_Unit",
"IsNullable": true,
"HasMetadata": false,
"Metadata": null
},
"Shipped_Net_Weight": {
"DataType": {
"TypeId": 13,
"Name": "utf8",
"IsFixedWidth": false
},
"Name": "Shipped_Net_Weight",
"IsNullable": true,
"HasMetadata": false,
"Metadata": null
},
"Missing_Quantity": {
"DataType": {
"TypeId": 9,
"Name": "int64",
"BitWidth": 64,
"IsSigned": true,
"IsFixedWidth": true
},
"Name": "Missing_Quantity",
"IsNullable": true,
"HasMetadata": false,
"Metadata": null
},
"Missing_Weight": {
"DataType": {
"TypeId": 12,
"Name": "double",
"BitWidth": 64,
"IsSigned": true,
"Precision": 2,
"IsFixedWidth": true
},
"Name": "Missing_Weight",
"IsNullable": true,
"HasMetadata": false,
"Metadata": null
},
"Pallet_Number": {
"DataType": {
"TypeId": 12,
"Name": "double",
"BitWidth": 64,
"IsSigned": true,
"Precision": 2,
"IsFixedWidth": true
},
"Name": "Pallet_Number",
"IsNullable": true,
"HasMetadata": false,
"Metadata": null
},
"Mother_Pallet": {
"DataType": {
"TypeId": 12,
"Name": "double",
"BitWidth": 64,
"IsSigned": true,
"Precision": 2,
"IsFixedWidth": true
},
"Name": "Mother_Pallet",
"IsNullable": true,
"HasMetadata": false,
"Metadata": null
},
"Daughter_Pallet": {
"DataType": {
"TypeId": 12,
"Name": "double",
"BitWidth": 64,
"IsSigned": true,
"Precision": 2,
"IsFixedWidth": true
},
"Name": "Daughter_Pallet",
"IsNullable": true,
"HasMetadata": false,
"Metadata": null
},
"Code_Picker": {
"DataType": {
"TypeId": 13,
"Name": "utf8",
"IsFixedWidth": false
},
"Name": "Code_Picker",
"IsNullable": true,
"HasMetadata": false,
"Metadata": null
},
"Code_Loader": {
"DataType": {
"TypeId": 13,
"Name": "utf8",
"IsFixedWidth": false
},
"Name": "Code_Loader",
"IsNullable": true,
"HasMetadata": false,
"Metadata": null
},
"Route": {
"DataType": {
"TypeId": 12,
"Name": "double",
"BitWidth": 64,
"IsSigned": true,
"Precision": 2,
"IsFixedWidth": true
},
"Name": "Route",
"IsNullable": true,
"HasMetadata": false,
"Metadata": null
},
"Tour_Rank": {
"DataType": {
"TypeId": 12,
"Name": "double",
"BitWidth": 64,
"IsSigned": true,
"Precision": 2,
"IsFixedWidth": true
},
"Name": "Tour_Rank",
"IsNullable": true,
"HasMetadata": false,
"Metadata": null
},
"Promo_Number": {
"DataType": {
"TypeId": 12,
"Name": "double",
"BitWidth": 64,
"IsSigned": true,
"Precision": 2,
"IsFixedWidth": true
},
"Name": "Promo_Number",
"IsNullable": true,
"HasMetadata": false,
"Metadata": null
},
"Full_Pallet": {
"DataType": {
"TypeId": 13,
"Name": "utf8",
"IsFixedWidth": false
},
"Name": "Full_Pallet",
"IsNullable": true,
"HasMetadata": false,
"Metadata": null
},
"Depot_Origine_Transfer": {
"DataType": {
"TypeId": 13,
"Name": "utf8",
"IsFixedWidth": false
},
"Name": "Depot_Origine_Transfer",
"IsNullable": true,
"HasMetadata": false,
"Metadata": null
},
"Depot_Preparation": {
"DataType": {
"TypeId": 13,
"Name": "utf8",
"IsFixedWidth": false
},
"Name": "Depot_Preparation",
"IsNullable": true,
"HasMetadata": false,
"Metadata": null
},
"Order_Date": {
"DataType": {
"TypeId": 13,
"Name": "utf8",
"IsFixedWidth": false
},
"Name": "Order_Date",
"IsNullable": true,
"HasMetadata": false,
"Metadata": null
},
"Empty_Included": {
"DataType": {
"TypeId": 13,
"Name": "utf8",
"IsFixedWidth": false
},
"Name": "Empty_Included",
"IsNullable": true,
"HasMetadata": false,
"Metadata": null
},
"Empty_Article_Number_Pallet": {
"DataType": {
"TypeId": 12,
"Name": "double",
"BitWidth": 64,
"IsSigned": true,
"Precision": 2,
"IsFixedWidth": true
},
"Name": "Empty_Article_Number_Pallet",
"IsNullable": true,
"HasMetadata": false,
"Metadata": null
},
"Empty_Article_Number_CV": {
"DataType": {
"TypeId": 12,
"Name": "double",
"BitWidth": 64,
"IsSigned": true,
"Precision": 2,
"IsFixedWidth": true
},
"Name": "Empty_Article_Number_CV",
"IsNullable": true,
"HasMetadata": false,
"Metadata": null
},
"Crate_Number": {
"DataType": {
"TypeId": 12,
"Name": "double",
"BitWidth": 64,
"IsSigned": true,
"Precision": 2,
"IsFixedWidth": true
},
"Name": "Crate_Number",
"IsNullable": true,
"HasMetadata": false,
"Metadata": null
},
"Warehouse_number": {
"DataType": {
"TypeId": 13,
"Name": "utf8",
"IsFixedWidth": false
},
"Name": "Warehouse_number",
"IsNullable": true,
"HasMetadata": false,
"Metadata": null
},
"Child_OBD_Number": {
"DataType": {
"TypeId": 9,
"Name": "int64",
"BitWidth": 64,
"IsSigned": true,
"IsFixedWidth": true
},
"Name": "Child_OBD_Number",
"IsNullable": true,
"HasMetadata": false,
"Metadata": null
},
"Missing_Motivation": {
"DataType": {
"TypeId": 13,
"Name": "utf8",
"IsFixedWidth": false
},
"Name": "Missing_Motivation",
"IsNullable": true,
"HasMetadata": false,
"Metadata": null
},
"Promo_Week": {
"DataType": {
"TypeId": 12,
"Name": "double",
"BitWidth": 64,
"IsSigned": true,
"Precision": 2,
"IsFixedWidth": true
},
"Name": "Promo_Week",
"IsNullable": true,
"HasMetadata": false,
"Metadata": null
},
"Delivery_Type": {
"DataType": {
"TypeId": 13,
"Name": "utf8",
"IsFixedWidth": false
},
"Name": "Delivery_Type",
"IsNullable": true,
"HasMetadata": false,
"Metadata": null
},
"Source_Name": {
"DataType": {
"TypeId": 13,
"Name": "utf8",
"IsFixedWidth": false
},
"Name": "Source_Name",
"IsNullable": true,
"HasMetadata": false,
"Metadata": null
}
},
"Metadata": {
"pandas": "{"index_columns": [{"kind": "range", "name": null, "start": 0, "stop": 3835405, "step": 1}], "column_indexes": [{"name": null, "field_name": null, "pandas_type": "unicode", "numpy_type": "object", "metadata": {"encoding": "UTF-8"}}], "columns": [{"name": "Transmission_Number", "field_name": "Transmission_Number", "pandas_type": "int64", "numpy_type": "int64", "metadata": null}, {"name": "Line_Number", "field_name": "Line_Number", "pandas_type": "int64", "numpy_type": "int64", "metadata": null}, {"name": "DC", "field_name": "DC", "pandas_type": "unicode", "numpy_type": "object", "metadata": null}, {"name": "GOLD_Article", "field_name": "GOLD_Article", "pandas_type": "int64", "numpy_type": "int64", "metadata": null}, {"name": "GOLD_Storage_Loc", "field_name": "GOLD_Storage_Loc", "pandas_type": "unicode", "numpy_type": "object", "metadata": null}, {"name": "GOLD_Logistic_Variant", "field_name": "GOLD_Logistic_Variant", "pandas_type": "unicode", "numpy_type": "object", "metadata": null}, {"name": "SAP_Article", "field_name": "SAP_Article", "pandas_type": "int64", "numpy_type": "int64", "metadata": null}, {"name": "Delivery_Quantity_Base_Unit", "field_name": "Delivery_Quantity_Base_Unit", "pandas_type": "int64", "numpy_type": "int64", "metadata": null}, {"name": "Base_UOM", "field_name": "Base_UOM", "pandas_type": "float64", "numpy_type": "float64", "metadata": null}, {"name": "Delivery_Quantity_Preparation_Unit", "field_name": "Delivery_Quantity_Preparation_Unit", "pandas_type": "int64", "numpy_type": "int64", "metadata": null}, {"name": "Preparation_UOM", "field_name": "Preparation_UOM", "pandas_type": "float64", "numpy_type": "float64", "metadata": null}, {"name": "Client", "field_name": "Client", "pandas_type": "unicode", "numpy_type": "object", "metadata": null}, {"name": "Shipping_Date", "field_name": "Shipping_Date", "pandas_type": "unicode", "numpy_type": "object", "metadata": null}, {"name": "GOLD_Shipping_Id", "field_name": "GOLD_Shipping_Id", "pandas_type": "int64", "numpy_type": "int64", "metadata": null}, {"name": "Line_Id", "field_name": "Line_Id", "pandas_type": "float64", "numpy_type": "float64", "metadata": null}, {"name": "SAP_OBD_Number", "field_name": "SAP_OBD_Number", "pandas_type": "int64", "numpy_type": "int64", "metadata": null}, {"name": "Article_Managed_by_Unit", "field_name": "Article_Managed_by_Unit", "pandas_type": "unicode", "numpy_type": "object", "metadata": null}, {"name": "Shipped_Net_Weight", "field_name": "Shipped_Net_Weight", "pandas_type": "unicode", "numpy_type": "object", "metadata": null}, {"name": "Missing_Quantity", "field_name": "Missing_Quantity", "pandas_type": "int64", "numpy_type": "int64", "metadata": null}, {"name": "Missing_Weight", "field_name": "Missing_Weight", "pandas_type": "float64", "numpy_type": "float64", "metadata": null}, {"name": "Pallet_Number", "field_name": "Pallet_Number", "pandas_type": "float64", "numpy_type": "float64", "metadata": null}, {"name": "Mother_Pallet", "field_name": "Mother_Pallet", "pandas_type": "float64", "numpy_type": "float64", "metadata": null}, {"name": "Daughter_Pallet", "field_name": "Daughter_Pallet", "pandas_type": "float64", "numpy_type": "float64", "metadata": null}, {"name": "Code_Picker", "field_name": "Code_Picker", "pandas_type": "unicode", "numpy_type": "object", "metadata": null}, {"name": "Code_Loader", "field_name": "Code_Loader", "pandas_type": "unicode", "numpy_type": "object", "metadata": null}, {"name": "Route", "field_name": "Route", "pandas_type": "float64", "numpy_type": "float64", "metadata": null}, {"name": "Tour_Rank", "field_name": "Tour_Rank", "pandas_type": "float64", "numpy_type": "float64", "metadata": null}, {"name": "Promo_Number", "field_name": "Promo_Number", "pandas_type": "float64", "numpy_type": "float64", "metadata": null}, {"name": "Full_Pallet", "field_name": "Full_Pallet", "pandas_type": "unicode", "numpy_type": "object", "metadata": null}, {"name": "Depot_Origine_Transfer", "field_name": "Depot_Origine_Transfer", "pandas_type": "unicode", "numpy_type": "object", "metadata": null}, {"name": "Depot_Preparation", "field_name": "Depot_Preparation", "pandas_type": "unicode", "numpy_type": "object", "metadata": null}, {"name": "Order_Date", "field_name": "Order_Date", "pandas_type": "unicode", "numpy_type": "object", "metadata": null}, {"name": "Empty_Included", "field_name": "Empty_Included", "pandas_type": "unicode", "numpy_type": "object", "metadata": null}, {"name": "Empty_Article_Number_Pallet", "field_name": "Empty_Article_Number_Pallet", "pandas_type": "float64", "numpy_type": "float64", "metadata": null}, {"name": "Empty_Article_Number_CV", "field_name": "Empty_Article_Number_CV", "pandas_type": "float64", "numpy_type": "float64", "metadata": null}, {"name": "Crate_Number", "field_name": "Crate_Number", "pandas_type": "float64", "numpy_type": "float64", "metadata": null}, {"name": "Warehouse_number", "field_name": "Warehouse_number", "pandas_type": "unicode", "numpy_type": "object", "metadata": null}, {"name": "Child_OBD_Number", "field_name": "Child_OBD_Number", "pandas_type": "int64", "numpy_type": "int64", "metadata": null}, {"name": "Missing_Motivation", "field_name": "Missing_Motivation", "pandas_type": "unicode", "numpy_type": "object", "metadata": null}, {"name": "Promo_Week", "field_name": "Promo_Week", "pandas_type": "float64", "numpy_type": "float64", "metadata": null}, {"name": "Delivery_Type", "field_name": "Delivery_Type", "pandas_type": "unicode", "numpy_type": "object", "metadata": null}, {"name": "Source_Name", "field_name": "Source_Name", "pandas_type": "unicode", "numpy_type": "object", "metadata": null}], "creator": {"library": "pyarrow", "version": "9.0.0"}, "pandas_version": "1.4.3"}"
},
"HasMetadata": true
}

Additional context
Add any other context about the problem here.

Note: This tool relies on the parquet-dotnet library for all the actual Parquet processing. So any issues where that library cannot process a parquet file will not be addressed by us. Please open a ticket on that library's repo to address such issues.

Support opening a .parquet file directly from Windows Explorer

Thanks a lot for providing this Viewer!

I'm wondering if it should be possible to open a .parquet file in Windows Explorer directly with a mouse click. Currently, when I do this, the ParquetViewer starts up, but with an empty window. I then need to open the file manually by pressing main menu->Open, and browsing through the directory structure.

Maybe it's just a configuration that's missing on my side?
Otherwise, this is a kind feature request :)

Tobias

[BUG] App doesn't launch via pre-compiled binaries and any IDE besides Visual Studio

Parquet Viewer Version
2.4.2

Where was the parquet file created?
Not parquet related issue

Sample File
Not parquet related issue

Describe the bug
Launching latest pre-compiled binary fails silently.
Launching project using dotnet run fails silently
Launching project using Jetbrains Rider produces error Appx recipe file[] does not exist
Launching project using VS2022 works fine and then later launching produced binary works fine as well

Screenshots
I don't think they're helpful here

Additional context
Windows 10 21H2
dotnet cli version - 7.0.101
dotnet sdks installed - 7.0.101, 6.0.100
.NET 4.6 installed
.NET 4.7.2 or later is installed, or at least that's what 4.7.2 installer tells me

I'm not too familiar with windows GUI app development, so don't really know what to look at. If anything the error from running it in Rider seems to be the biggest hint, but can't really find much information on that specific error.

[FEAT] Display Rowgroup info

In the metadata viewer we could have a tab with row group info. Row group count, the size and number of rows of each of them

Reader for compression '4' not supported

Is there a possibility that this could be implemented?
My files are compressed using pandas and brotli compression. Compression gzip works.
df.to_parquet(path=saveurl, compression='brotli')
image

Exception when opening file (1,6 MB)

I am exporting a table from SQL Server to parquet and back. I am using Parquet.NET v 3.0 to perform the export and it seems to run normally.
When opening one file (~266KB) with the viewer it just works fine. Another "larger" file (1,6MB) would throw the following exception:

Exception ParquetViewer

after the column selection dialog.

I investigated further and this only happens if I select String columns that had been of type "varchar" on SQL server. Any "nvarchar" column works just fine.

However when passed to the writer they are converted to regular C# strings before, so encoding should not normally be an issue.

[BUG] Cannot open Parquet file with 2 similar column names (different case)

Parquet Viewer Version
What version of Parquet Viewer are you experiencing the issue with?
2.4.2.0

Where was the parquet file created?
pyarrow

Sample File
Example.zip

Describe the bug
I believe the bug comes from having two column names that are equal when viewed as lowercase.
I can open the file in pyarrow/python, not in ParquetViewer.
Screenshot 2023-01-27 113819

Screenshots
Screenshot 2023-01-27 113152

Additional context
The similar column names is a bug in my code, but should not make the program crash.

Note: This tool relies on the parquet-dotnet library for all the actual Parquet processing. So any issues where that library cannot process a parquet file will not be addressed by us. Please open a ticket on that library's repo to address such issues.

[BUG] Unable to display data exported from Oracle database

Parquet Viewer Version: 2.7.1.0

Where was the parquet file created?
python, using pandas and fastparquet library

Sample File
Sample file attached.

Example.zip

Describe the bug
Try to open file; if you select only first column, if will open fine. If you select all, second will cause problem and no data will be displayed.

Screenshots
Attached screenshot.

parq-viewer-2 7 1 0-bug-screenshot

Additional context
Original column definition from Oracle database:

Limit type        NOT NULL VARCHAR2(20)
Limit period in days                NUMBER

[BUG] Doesn't work with multiple row groups

Parquet Viewer Version
2.2

Where was the parquet file created?
C#

Sample File
Test.zip

Describe the bug
In UtilityMethods class this line contains a bug after the first call:

image

For example, if I have two row groups with 2 lines each.
At first call the if will be if (rowIndex=0 >= readRecords=2) - OK
But the next calls will be if (rowIndex=2 >= readRecords=2) and it will break. Unless the second row group is bigger than the first, but it is buggy anyway, since it will skip rows.

After fixing this issue, I also found another problem, where the row count is not respected after the first row group:

image

Screenshots
image

Additional context
Add any other context about the problem here.

Note: This tool relies on the parquet-dotnet library for all the actual Parquet processing. So any issues where that library cannot process a parquet file will not be addressed by us. Please open a ticket on that library's repo to address such issues.

Error opening .parquet file

Larger .parquet files give ArgumentOutOfRangeException" error for me, just like mentioned in closed ticket #5.

For me, it isn't an urgent issue, as I transitioned to .sqlite files for some other technical reasons, but as it might be usefull for you or other users... but as you asked for example files in the now closed tickets.

The .parquet file was created using pyarrow in python, and I tested in ParquetViewer 1.1:
https://drive.google.com/open?id=12vIw3f5tMURfzhI6hO3clHtPafPwhqnq

ParquetViewer_2019-07-24_09-27-55

[BUG] very low/high dates/timestamps (0001-01-01 and 9999-12-31 23:59:59.9999) cause problems

First of all: I'm not sure if my problem is a problem of ParquetViewer or parquet-dotnet please let me know if I'm wrong here...

Parquet Viewer Version
v2.3.6

Where was the parquet file created?
Apache Spark 3.1.2

Sample File
PySpark code for creating the file

import datetime

df = sc.parallelize([
    [
        1,
        datetime.date(1985, 12, 31),
        datetime.date(   1,  1,  2),
        datetime.date(9999, 12, 31),
        datetime.date.max,
        datetime.date.min,
        datetime.datetime(1985,  4, 13, 13,  5),
        datetime.datetime(   1,  1,  2,  0,  0),
        datetime.datetime(9999, 12, 31, 23, 59, 59),
        datetime.datetime.max,
        datetime.datetime.min
    ]
]).toDF((
    "ID",
    "Date_Normal",
    "Date_Low",
    "Date_High",
    "Date_Max",
    "Date_Min",
    "Timestamp_Normal",
    "Timestamp_Low",
    "Timestamp_High",
    "Timestamp_Max",
    "Timestamp_Min"
))

display( df )


spark.conf.set('spark.sql.legacy.parquet.int96RebaseModeInWrite', 'CORRECTED')
spark.conf.set('spark.sql.legacy.parquet.datetimeRebaseModeInWrite', 'CORRECTED')

(df.coalesce(1)
  .write
  .mode('overwrite')
  .format('parquet')
  .save('tmp/spark_datetime/')
)

part-00000-f85e122f-806f-4375-91da-04de38bc0c9c-c000.snappy.parquet.zip

Describe the bug
When a Parquet file contains very low or very high date and timestamp values, this causes trouble:

  • Date 0001-01-01 is displayed as blank cell (column "Date_Min)
  • Timestamp 0001-01-01 00:00:00.000000 is displayed as blank cell (column "Timestamp_Min)
  • Timestamp 9999-12-31 23:59:59.999999 causes an exception (column "Timestamp_Max)
    • System.AggregateException, SystemArgumentOutOfRangeException

Screenshots
image

Additional context
The problem might be related to https://issues.apache.org/jira/browse/SPARK-31404.
Spark changed calendar between Spark 2.4 and 3.0.

Note: This tool relies on the parquet-dotnet library for all the actual Parquet processing. So any issues where that library cannot process a parquet file will not be addressed by us. Please open a ticket on that library's repo to address such issues.

[FEAT] Adjust column size to data/column name

Describe the feature you'd like to be added to Parquet Viewer
It would be great to be able to adjust automatically the size of all columns to the size of the data and/or the column name. When opening parquet files with lots of double type columns, you have to adjust individually each column to be able to see the whole number (with the E-# at the end). That way you can easily view column values and compare them between columns.

Adjusting by column contents will make all column sizes to match maximum length of a column value
Adjusting by column names will make all column sizes to match the length of the column name
Adjusting by column names and contents will make all column sizes to match the biggest length of either the column name or the column values

Share why this feature would be a good addition to the utility
It will improve usability and data readability

Screenshots
I will try to put here a proposal for the menu items:

Tools
|  Adjust columns > | Column contents
                    | Column names
                    | Column names and contents


I just found your tool and I think it's great!!

Thank you for the effort!

Best regards,

Carlos

Alternating rows are displaying with incorrect values

Using Release 2.1 Binary download.

Problem:
I have a parquet file with 30 rows (attached as a file with .parquet extension then zipped
Rows_5036221_5036251.zip
). When I load it in to ParquetViewer, every 2nd row is incorrect. The first column is titled i (columns i,j,k are coordinates). Every 2nd row of the i column displays a 0 instead of the value it's meant to have (j,k are meant to all have the same value). The value it's meant to have in i, is then pushed to the next row. So for example:
I have column of
156,
157,
158,
159,
160

in parquet viewer i get:
156,
0,
157,
0,
158,

For the rest of the columns , some of them have correct values for their row position and some don't
e.g.
row 20 and 21 (count beginning at 1) contain "LMS1" for column V1. This is correct.
but in the same rows S6 contains 0 when it should be "5".

In this file the S columns are a status for a null value replacement string. If the S column is 0, then the V column of the same number should contain a value, but if the S column contains a non 0 value, the V column should be blank.

I've compared the results with the similar tool "BiddataFileViewer" at https://github.com/Eugene-Mark/bigdata-file-viewer and also with my dev's while debugging the parquet file creation (they are using pyarrow to import a csv into a parquet file). Screen shots of results attached.

*Note that BigFileDataViewer has other issues with this file. It errors on the first attempt to load (complaining about incorrect magic numbers), but then loads correctly on a subsequent attempt.

ParquetViewer
image

BigFileDataViewer
image

[BUG] How to Open the App in windows ?

Parquet Viewer Version
What version of Parquet Viewer are you experiencing the issue with?

Where was the parquet file created?
Apache Spark, Hive, Java, C#, pyarrow, etc.

Sample File
Upload a sample file so the issue can be debugged!

Describe the bug
A clear and concise description of what the bug is.

Screenshots
If applicable, add screenshots to help explain your problem.

Additional context
Add any other context about the problem here.

Note: This tool relies on the parquet-dotnet library for all the actual Parquet processing. So any issues where that library cannot process a parquet file will not be addressed by us. Please open a ticket on that library's repo to address such issues.

V2.3.1 - issues with .exe in Virustotal

Parquet Viewer Version
2.3.1

Where was the parquet file created?
no Parquet file involved

Sample File
no Parquet file involved

Describe the bug
Scanned the ParquetViewer.exe with Virustotal and it gave the following issues, see screenshot.
Please check and advise whether this is really an issue or wrong positive.

Screenshots
image

Additional context

Note: This tool relies on the parquet-dotnet library for all the actual Parquet processing. So any issues where that library cannot process a parquet file will not be addressed by us. Please open a ticket on that library's repo to address such issues.

[BUG] Cannot open file because of missing column, but column is present.

Parquet Viewer Version
What version of Parquet Viewer are you experiencing the issue with?
2.5.1.0

Where was the parquet file created?
Pandas Python - 1.5.3

Sample File
Example.zip

Describe the bug
Cannot open the file, but file can be opened in Python as the column flagged as missing is present. Likely because of a trailing "." (as that is the only mismatch between the file and the bugreport)

Screenshots
If applicable, add screenshots to help explain your problem.
image

[FEAT] Search text in multiple Parquet files in one folder

Hello
Is this possible in the new version
Include the ability to search for the same text in multiple Parquet files
Currently, all files must be opened one by one to search for a text
And it will take a lot of time
The ability to search for a text in several Parquet files inside a folder can be really useful
Now I have to open files one by one and wait for loading and searching and then the next file
If you add this feature, it will be really great
With respect

Application Error when trying to open parquet file

Parquet Viewer Version
ParquetViewer_SelfContained.exe, 2.7.1.0

Where was the parquet file created?
unknown

Sample File
https://huggingface.co/datasets/Gaivoronsky/hh-rlhf-ru-rl/tree/main/data

Describe the bug
When user opens file error is emitted. No stack trace given.

Screenshots
Standard windows dialog saying "Application stopped working"

Additional context
Error emitted when trying to open file:
https://huggingface.co/datasets/Gaivoronsky/hh-rlhf-ru-rl/tree/main/data

[[Note: This tool relies on the parquet-dotnet library for all the actual Parquet processing. So any issues where that library cannot process a parquet file will not be addressed by us. Please open a ticket on that library's repo to address such issues.]

[BUG] sbyte and byte types swapped

Parquet Viewer Version
Version 2.7.0.0.

Where was the parquet file created?
My own tool

Describe the bug
In the parquet viewer SByte and Bye are mixed up:

your code:

    private static Type ParquetNetTypeToCSharpType(Parquet.Thrift.SchemaElement thriftSchema, Parquet.Schema.DataType type)
        {
            Type columnType;
            switch (type)
            {
---removed some lines---
                case Parquet.Schema.DataType.Byte:
                    columnType = typeof(sbyte);    --> should be typeof(byte) as a .Net byte is unsigned
                    break;
---removed some lines---
                case Parquet.Schema.DataType.SignedByte:
                    columnType = typeof(byte);   --> should be typeof(sbyte) as a .Net byte is unsigned
                    break;
---removed some lines---            }

            return columnType;
        }

atleast fixed solved the problem for my test.

Kind regards,
Maurice,

[BUG] Handling Files with many columns

Parquet Viewer Version
2.5.1.0

Where was the parquet file created?
pyarrow

Describe the bug
I am working with parquet files with many columns (e.g. 30'000).
There is seemingly no efficient way to limit the columns - a deselect all option would be very helpful.
Even for small parquet files with many columns (e.g. 3 MB, 239 rows, 3643 columns), the data load takes long (3-5 min, massive CPU usage, low memory consumption).
For larger files (e.g. 12 MB, 239 rows, about 10'000 columns), the whole system freezes already during the file analysis stage.

Is there any way to work with files with many columns?

[FEAT] Display null values as NULL in italic (or something else)

Describe the feature you'd like to be added to Parquet Viewer
When the value is null, display the word NULL in italic

Share why this feature would be a good addition to the utility
Currently it is impossible to distinguish between empty string and NULL value

[FEATURE-REQUEST] Ability to open partitioned files

Parquet Viewer Version

2.3.1.41849

Where was the parquet file created?

Pandas -> pyarrow

dfStore.to_parquet(BUILDINGS_OUTPUT_FILE, partition_cols= ["type"])

Sample File

pv_bugdemo.parquet.zip

Describe the bug

A partitioned file that is actually a folder with several subfiles should be supported. This probably involves checking if the "file" is actually a directory then traversing the tree to read the individual constituent files.

Note: This tool relies on the parquet-dotnet library for all the actual Parquet processing. So any issues where that library cannot process a parquet file will not be addressed by us. Please open a ticket on that library's repo to address such issues.

[FEATURE REQUEST] Display timestamp fields in human-intelligible format

ParquetViewer is the best parquet file viewer. However, it shows 'timestamp' fields in Unix epoch (or Unix time or POSIX time or Unix timestamp), i.e., the number of seconds that have elapsed since January 1, 1970 (midnight UTC/GMT). For example,

image

The 'date' field is displayed just fine. It is requested that 'date-time' field be made viewable in human-intelligible format.

Meanwhile, I've posted a request at https://stackoverflow.com/questions/68741520/date-time-timestamp-field-in-parquet-file-shown-as-numbers-in-parquet-file-vie to see if something (like formatting change) could be done in R environment before exporting to parquet format so that it displays 'timestamp' field in human-intelligible format in ParquetViewer.

Time Resolution

Hi,

Thanks for making the tool. The time data being displayed appears to omit data finer than minute.
Is it possible to add the higher resolution to the data viewer?

[FEAT] Time Decimals (milliseconds) in CSV export

when opening a file, it shows the Time format to many decimal points
but when exporting the same data, it does not "honour" the time format and only saves down to a second,

Would you please kindly maybe one day make the export function also put in the extra decimal figures in the CSV?

hugs and kisses
Amin

Describe the feature you'd like to be added to Parquet Viewer
Provide as much detail as possible.

Share why this feature would be a good addition to the utility
Will it improve usability, reliability or some other aspect?

Screenshots
Any screenshots describing how the feature would look is a plus.

Note: There are no guarantees your feature will be implemented.

[FEATURE REQUEST] Ability to read parquet files compressed with "zstd"

Parquet files are nowadays often saved with ZSTD compression since its decompression speed and hence reading speed is very significantly faster than GZIP standard which is currently supported by ParquetViewer.

We regularly use Apache Arrow and plan to use ZSTD compression instead of GZIP as documented in the webpage https://arrow.apache.org/docs/r/reference/write_parquet.html. It would be great if we could continue to use ParquetViewer.

Currently attempting to open ZSTD compressed parquet file results in an error as shown in this screenshot:
image

It is requested that support for ZSTD compression be incorporated in future versions of ParquetViewer.

Error: Sum of the columns FillWeight values cannot exceed 65535, when trying to load a parquet file with a large number of columns

Version: 2.2.7625.41329
Parquet file created by pyarrow

Sample File (Note that the data in this file is junk data so in this case the values are mostly meaningless) :
NewRows_1_30_2009cols.zip

Problem:
When I try to load a parquet file that contains > 4000 columns I receive an error: " Sum of the columns FillWeight values cannot exceed 65535" when it tries to add columns to the DataGridView forms object.
The Grid view seems to load ~656 of my 4666 columns into the header row within the grid, before it throws the error and then ceases to load anything else into the grid.

To replicate:

  1. Try to open the attached parquet file.

image

I have successfully been able to open the file with another parquet viewer (within an IDE) so it seems that the parquet file is valid.

image
image

Path Separator issue

Hi,

I would like to understand what are the special characters used for path separator as I am getting the below error? Attaching the column list for your reference. Kindly assist.

image

Lead_Test1.zip

[FEATURE-REQUEST] Support BYTE_ARRAY Decimal format

Describe the feature you'd like to be added to Parquet Viewer
Support Decimal format so that it shows proper text instead

For a field of decimal like below
{
"Field_id": 0,
"Name": "CD_ID",
"Type": "BYTE_ARRAY",
"Type_length": 0,
"LogicalType": "LogicalType(DECIMAL: DecimalType(, Scale: 12, Precision: 38))",
"Scale": 12,
"Precision": 38,
"Repetition_type": "OPTIONAL",
"Converted_type": "DECIMAL"
},

GUI shows:
Weird characters

For these characters even copy-paste is not possible

Additional Feature/Bug Requerst:
Integer adds thousand seperator by default, it should not add it by default as settings do not persist and need to uncheck each time open a file.

Thanks.
2021-10-27 11_14_29-Parquet Metadata Viewer

View split files

I've found some data named like XXXXX_UserData_0001.parquet up to XXXXX_UserData_0005.parquet
When opening any of the files I get this error:
image
Is it possible to open any of these files?

Add support for List type fields

Parquet Viewer Version
2.3.0.40676

Where was the parquet file created?
AWS EMR > Apache Spark

Sample File
cant do, its 22MB with sensitive information and i dont have control of producer.

Describe the bug
Parquet contains a field which is a nested table (array of strings).. so I guess ParquetViewer is unable to open these type of parquets.
I cant share parquet data file, but im attaching detailed error popup screenshot - basically its refering to the nested field.

Screenshots
parquet-error-nestedtable
parquet-error-nestedtable2

Additional context

Note: This tool relies on the parquet-dotnet library for all the actual Parquet processing. So any issues where that library cannot process a parquet file will not be addressed by us. Please open a ticket on that library's repo to address such issues.

[BUG] Error when opening file containing columns of LIST type

Parquet Viewer Version
2.5.1

Where was the parquet file created?
pyarrow

Sample File
test_file_20230120_common_ts.zip

Describe the bug
The Parquet Viewer is not able to open a file that contains columns of LIST type. In the attached file "object_data__obj_angles" is a column of LIST type but as you see in the screenshot it complains that this column doesn't exist (don't know why). However, v2.4 mentioned at #33 (comment) is able to load this file without issues.

Screenshots
v2.5.1
image

v2.4
image

Runs out of Memory

Runs out of memory when opening files larger than 10-20 MB. It uses massive amounts of memory and becomes unusable with anything but the smallest of files.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.