This project provides libparquetfile, a C++ library which can generate parquet files.
Additionally the proto2parq application is provided which can convert a data files or streams containing protobuf defined records into parquet format.
The early development on this project was inspired by Neal Sidhwaney's cpp-parquet project.
Some of the parquet writing C++ components are extracted from the Impala Database Project.
You'll need the Thrift development tools installed:
sudo dnf install thrift-devel
Update the parquet-format submodule:
git submodule update --init
Build with make from the top-level directory:
gmake
The sample program generates the sample data described in the Dremel paper and the parquet-mr annotation document in protobuf format.
The output of this program can be piped into proto2parq and converted to parquet format:
cd sample/OBJDIR
./sample | ../../proto2parq/OBJDIR/proto2parq --outfile=sample.parquet
The protobuf schema is prepended to the begining of the protobuf data output.