Code Monkey home page Code Monkey logo

Comments (3)

chris920820 avatar chris920820 commented on September 27, 2024

To be honest, we I try to write a large testing file in my pipeline, I often get a panic (I encountered IndexOutOfBound and also nil pointer) . It could either be a user error, or code bug. But in either case, panic makes it extremely hard for debugging, and it will kill the main process (unless use recovery, but still it may not be a good practice I think). I hope maybe we can do something like

err := pw.Write(rec)
// and similarly
err := pw.Flush(true)

I know it may be a bunch of refactoring to do and testing to cover panic, but I believe it can make the code more reliable.
I will keeping testing on that pipeline, to get clear whether it is a user error or code bug.
Thanks for your help :)

from parquet-go.

xitongsys avatar xitongsys commented on September 27, 2024

hi, @chris920820

  1. You should always use Flush(true) one time in your code before write stop. The parameter is a little confused. There are two buffers in the writer, one is to store records, the other is to store pages. When the size of records is larger than a page size, the record buffer will be cleaned and written to a page. When the pages buffer size is larger than the row group size, then the pages will be written to a row group, which means write to file. So I give a bool parameter to the Flush function. When it is false, the flush operation is only on records buffer. Otherwise both buffers will be cleaned.
    All the process is controlled by the PageSize(default is 8K=8*1024) and RowGroupSize(default is 128M(128*1024*1024)). The users needn't do it by themselves. But you can't write too many records every time, because of the limitation of your memory size.
    If you want to change the default RowGroupSize, you can set
pw.RowGroupSize=256*1024*1024

PS: I'm also considering to include the Flush function in WriteStop, so users needn't call it anymore.
2) I will add more error handlers in new version, sorry about the inconvenience.

from parquet-go.

chris920820 avatar chris920820 commented on September 27, 2024

@xitongsys
Hey!
Thanks for your kindly replying! It was very clear.
Adding some graceful error handling will certainly make the code more robust. I have now integrated this code into production usage, and it is in testing phase now. Hopeful everything will work great :)

from parquet-go.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.