Code Monkey home page Code Monkey logo

Comments (11)

isaacabraham avatar isaacabraham commented on May 28, 2024

Happy to have a look at this. Just had a quick look - under the bonnet this calls CsvFile.Load in FSharp.Data; you can get this error with that if you try to read the stream after you've reached the end of it, but it looks like in Deedle that the call to Load is then Cached() which would prevent this happening...

from deedle.

tpetricek avatar tpetricek commented on May 28, 2024

If you could have a look that would be awesome!

So I suppose this would work if you set inferTypes=false (worth testing...). In principle we should call Cache before that, but when I wrote the code (using older F# Data), there was something I wasn't able to do on the cached data (don't remember what).

Also, does this work when you just specify http://... as the file name? If no, supporting that would be great too!

from deedle.

isaacabraham avatar isaacabraham commented on May 28, 2024

OK - will check that out :-)

from deedle.

ovatsus avatar ovatsus commented on May 28, 2024

There was a fix in FSharp.Data recently that might solve this: fsprojects/FSharp.Data#562

from deedle.

isaacabraham avatar isaacabraham commented on May 28, 2024

OK. Deedle should probably cache the data anyway though before doing the schema inference anyway though (or just turn it off as @tpetricek suggests)?

from deedle.

tpetricek avatar tpetricek commented on May 28, 2024

Deedle needs to go over the data just two times - for inference and to load it.

It would be great if we could cache the data in these two passes. It would also be great to change the default inferRows count to, say, 100 as suggested here.

I was suggesting to turn the inference off just for testing (so that we can find out whether the problem is really just the second pass during the inference).

from deedle.

isaacabraham avatar isaacabraham commented on May 28, 2024

Oh right. If it's using all the contents to infer the schema then yes we should limit that too I would imagine.

from deedle.

isaacabraham avatar isaacabraham commented on May 28, 2024

So, as you suspected, turning off inference fixes the problem (which is no surprise as I can repro this error on FSharp Data 2.0.8 by trying to read past the end of the stream). You can't do caching before the inference because we start with a CsvFile, but Cache() returns CsvFile<'RowType>; the two types don't appear to be compatible with one another insofar as InferColumnTypes is a method living on CsvFile, but not on CsvFile<'RowType>.

I'll try rebuilding Deedle against the latest FSharp Data rather than 2.0.8 (I'm assuming that the fix above isn't included in 2.0.8?).

from deedle.

tpetricek avatar tpetricek commented on May 28, 2024

@isaacabraham Yes! I think this was the trouble I was having too. If you can work with @ovatsus and support this using a new version of F# Data, that would be fantastic!

from deedle.

isaacabraham avatar isaacabraham commented on May 28, 2024

So I've put a fix in place in the meantime by forcing two reads, one for the schema and another for the actual data and submitted a pull request (#227). I don't think we can get around this until FSharp.Data supports caching as described above.

I've also reduced the number of rows used for schema generation to 100.

from deedle.

zyzhu avatar zyzhu commented on May 28, 2024

Close as the script works now.

from deedle.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.