Code Monkey home page Code Monkey logo

datareader's People

Contributors

gmeixiong avatar kshedden avatar mzimmerman avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

datareader's Issues

Writer?

Does this library only read stata.dta files? Is it possible to write them?

Thanks!

Unexpected non-zero end_of_first_byte

Getting two types of errors reading in .sas7bdat files:

  • Unexpected non-zero end_of_first_byte
  • 32 Character byte unknown

Any ideas on what's going on and how to handle?

Benchmarks?

Hello!

I just came across this package. I'm aware that Go is generally very fast. Do you happen to know how the read speeds for Stata files with this package compare to Stata or Python? Is the reading multithreaded?

Also, "simple column-oriented data container" caught my eye. I'm especially curious if this data structure is similar to one that can be written by the parquet-go package. Since Parquet is a column-oriented file format, I'm guessing that reading Stata files with your package and writing it with parquet-go could be much faster than my current code to do that in Python.

SAD7BDAT.read_next_page does not handle EOF.

(This is probably true for other file types, but I haven't tested them.)

In sas7bdat.go:818, the code assumes that any non-nil error should void the read, but that is not true for io.EOF. The golang io package says explicitly:
"""
// When Read encounters an error or end-of-file condition after
// successfully reading n > 0 bytes, it returns the number of
// bytes read. It may return the (non-nil) error from the same call
// or return the error (and n == 0) from a subsequent call.
// An instance of this general case is that a Reader returning
// a non-zero number of bytes at the end of the input stream may
// return either err == EOF or err == nil.
"""

Thus, if a File interface chooses to return EOF immediately on the last, positive-byte-sized read rather than waiting to return EOF with a 0-byte read (as the s3 filesystem I am using does), then this breaks the SAS7BDAT reader.

I believe a simple fix would be to change the line (sas7bdat.go:818) to:

	if err != nil && err != io.EOF {

Would that work?

TrimStrings truncates data

I often use the stattocsv command line utility to work with data using the GNU utilities and I noticed that some of my columns with addresses, e.g., "50 E. Main St" only came out as "50".

I then wrote my own frontend using your library to spit out as CSV and it didn't have that issue.

Debugging some, I found it was the TrimStrings bool value causing the problem but I didn't debug further.

Support various type conversions.

haven::read_sas is able to correctly parse "numeric" columns into integer or boolean columns. It's still not clear to me how haven::read_sas becomes aware of the underlying type, though if I had to guess it is related to how many bytes are stored for each column value.

It would be nice if datareader were able to similarly infer type and distinguish ints and bools from floats.

First few rows of Data() method has wrong offset

I am reading a sas7bdat file that represents a table with ~15K rows. The reader gives the correct column formats, names, and labels. For the data itself, the first 18 rows are incorrect in that columns have incorrect offsets while the rest is correct.

Panic

goroutine 1 [running]:
github.com/kshedden/datareader.(*SAS7BDAT).processByteArrayWithData(0xc0000e4000, 0x2fe8, 0x90, 0xc0000ead20, 0x40b7c8)
        /afs/umich.edu/user/k/s/kshedden/go/src/github.com/kshedden/datareader/sas7bdat.go:1190 +0x6b3
github.com/kshedden/datareader.(*SAS7BDAT).readline(0xc0000e4000, 0x1f40, 0x1f40, 0xc000108000)
        /afs/umich.edu/user/k/s/kshedden/go/src/github.com/kshedden/datareader/sas7bdat.go:813 +0x31d
github.com/kshedden/datareader.(*SAS7BDAT).Read(0xc0000e4000, 0x3e8, 0x13, 0x13, 0x0, 0x0, 0x1d)
        /afs/umich.edu/user/k/s/kshedden/go/src/github.com/kshedden/datareader/sas7bdat.go:660 +0x389
main.doConversion(0x50e6c0, 0xc0000e4000)
        /afs/umich.edu/user/k/s/kshedden/go/src/github.com/kshedden/datareader/cmd/stattocsv/main.go:31 +0x94c
main.main()
        /afs/umich.edu/user/k/s/kshedden/go/src/github.com/kshedden/datareader/cmd/stattocsv/main.go:144 +0x24a

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.