Comments (11)
Happy to have a look at this. Just had a quick look - under the bonnet this calls CsvFile.Load in FSharp.Data; you can get this error with that if you try to read the stream after you've reached the end of it, but it looks like in Deedle that the call to Load is then Cached() which would prevent this happening...
from deedle.
If you could have a look that would be awesome!
- I think we actually iterate over the data when doing the inference before calling
Cache
: https://github.com/BlueMountainCapital/Deedle/blob/master/src/Deedle/FrameUtils.fs#L415 - The
Cache
call happens only after that:
https://github.com/BlueMountainCapital/Deedle/blob/master/src/Deedle/FrameUtils.fs#L424
So I suppose this would work if you set inferTypes=false
(worth testing...). In principle we should call Cache
before that, but when I wrote the code (using older F# Data), there was something I wasn't able to do on the cached data (don't remember what).
Also, does this work when you just specify http://...
as the file name? If no, supporting that would be great too!
from deedle.
OK - will check that out :-)
from deedle.
There was a fix in FSharp.Data recently that might solve this: fsprojects/FSharp.Data#562
from deedle.
OK. Deedle should probably cache the data anyway though before doing the schema inference anyway though (or just turn it off as @tpetricek suggests)?
from deedle.
Deedle needs to go over the data just two times - for inference and to load it.
It would be great if we could cache the data in these two passes. It would also be great to change the default inferRows
count to, say, 100 as suggested here.
I was suggesting to turn the inference off just for testing (so that we can find out whether the problem is really just the second pass during the inference).
from deedle.
Oh right. If it's using all the contents to infer the schema then yes we should limit that too I would imagine.
from deedle.
So, as you suspected, turning off inference fixes the problem (which is no surprise as I can repro this error on FSharp Data 2.0.8 by trying to read past the end of the stream). You can't do caching before the inference because we start with a CsvFile, but Cache() returns CsvFile<'RowType>; the two types don't appear to be compatible with one another insofar as InferColumnTypes is a method living on CsvFile, but not on CsvFile<'RowType>.
I'll try rebuilding Deedle against the latest FSharp Data rather than 2.0.8 (I'm assuming that the fix above isn't included in 2.0.8?).
from deedle.
@isaacabraham Yes! I think this was the trouble I was having too. If you can work with @ovatsus and support this using a new version of F# Data, that would be fantastic!
from deedle.
So I've put a fix in place in the meantime by forcing two reads, one for the schema and another for the actual data and submitted a pull request (#227). I don't think we can get around this until FSharp.Data supports caching as described above.
I've also reduced the number of rows used for schema generation to 100.
from deedle.
Close as the script works now.
from deedle.
Related Issues (20)
- Move to netstandard2.0 only and decouple from RProvider for now HOT 2
- Converted code to Deedle HOT 6
- Truncate part of a Frame HOT 1
- Sample code snippet and introduction for the fslab website HOT 2
- Signature of Matrix.dot is unnecessarily limited HOT 3
- Breaking change for Min/Max stats function between v2.1.0 and 2.1.1 HOT 2
- Frame.mapColValues is weirdly slow compared to mapping columns as series and joining with Frame.ofColumns HOT 1
- Frame.toArray2D throws a System.Format exception HOT 1
- Optionalvalue ignores culture
- "Select not supported" for GetRowsAs<'T>
- FSLab page out of date, not working HOT 1
- Unable to load text file with space seperated
- Stats.cov throws if Stats.stdDev contains missing values HOT 2
- Change melt to have optional parameters in line with Pandas DateFrame and R's data.table
- Deedle Finance.ewmVol is returning the rolling mean not the standard deviation HOT 1
- Exporting frames to a json
- Suggestion to distinct rows by specified columns HOT 2
- Series.windowSize throws IndexOutOfRangeException when size is bigger than input series length HOT 1
- About latest .net interactive
- Frame.ofRecords fails silently when underlaying record type is internal or private
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from deedle.