Comments (7)
Are you able to share the file? Happy to take a look at what's going on here.
from csv.jl.
Sure i sent you an invitation on your Gmail account....the file is inside that Dropbox folder.
Thanks!
from csv.jl.
@Unoqualsiasi, sorry it's taken so long to respond; my list of improvements to DataStreams & CSV was a tad long. I've been playing w/ this dataset though the last few days and it's a doozy. It's certainly non-traditional to have such a wide dataset. One question I had was if this dataset is meant to be transposed? It seems the first column is all strings like labels, and the rest of the columns are all integer codes, but the dataset has no header. Is that indeed the case? If it is, I actually have been working on a way to read in a csv file while transposing on the fly. For this csv file in particular, the read time goes down to 5-6s on my machine (vs. 70-80 on current 0.5 code).
from csv.jl.
So that dataset it's actually a small example of what you can find working in the genomic field......a lot of columns with only integers (usually 0, 1 and 2). The firs column is the ID (animal or person) and from the second column you have the SNPs. Usually you don't need labels (header) because they are stored in a different file.
Sometimes you need to transpose it, sometimes not. :D
from csv.jl.
Ok, so the CSV.TransposedSource
type is now officially on master. It allows convenient reading of transposed csv data into Julia structures. It's extremely accessible, you just have to do CSV.read(file; transpose=true)
. For the original file here, I get timings of ~6s, which is about 3x Base.readdlm
and the resulting dataset is probably much more useful (no additional transposing needed).
from csv.jl.
I am having a similar issue where CSV.read() on a 212×1005 matrix is excruciatingly slow (didn't complete in 3 minutes) while readtable() finishes in like 3 seconds. This is also a genetics dataset and I do not wish to transpose it later. I could share the dataset I'm reading as well
from csv.jl.
I was using readtable, but upgraded to the newest version of Julia. I tried to implement CSV.read to my old codes, but a relatively small data of 2K*3K took forever.
from csv.jl.
Related Issues (20)
- `CSV.io` is not defined
- CSV.File breaks with multiple input CSVs
- Reading large CSV files is slow/crashes HOT 1
- Performance regression since v0.8.0 HOT 1
- `stripwhitespace=true` not removing trailing white space? HOT 1
- Do not edit "N/A", "NA", and similar entries **by default**. HOT 3
- skipto breaks if there is a quote in the skipped rows HOT 3
- getproperty on File makes internal use of dot notation problematic HOT 1
- big integers are parsed as Float64
- Too many missing warnings HOT 3
- writeheader=true ineffective in combination with header=
- Do not convert quoted cells
- CSV.write should conditionally convert type unstable iterators
- [Bug] CSV.read randomly changes eltype of column HOT 7
- pool kwarg documentation HOT 1
- There is no clear method reading non-UTF8 gzipped file in example
- burntsushi's issue HOT 1
- Multithreaded parsing error should be warning HOT 7
- Error reading CSV - missing lines HOT 2
- Load error with Parsers.Options HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from csv.jl.