Comments (6)
Should be closed by JuliaLang/METADATA.jl#4014
from csv.jl.
(i.e. just do a Pkg.update()
now and you should have the latest version)
from csv.jl.
This is not resolved:
I tested again today (on two different Win7 machines). Here are the warmed up timings of a 1000 row file:
julia> f="c:\temp\julia1k.csv"
"c:\temp\julia1k.csv"
julia> @time
f1=readcsv(f);
0.052015 seconds (239.86 k allocations: 8.536 MB)
julia> @time
df=readtable(f);
0.064770 seconds (224.02 k allocations: 10.468 MB, 18.82% gc time)
julia> @time
f2=CSV.read(f,rows_for_type_detect=1000);
11.974838 seconds (1.81 M allocations: 74.173 MB, 0.12% gc time)
julia> size(f1)
(1000,77)
julia> size(df)
(999,77)
I apologize to the user @time
for pinging her/him
from csv.jl.
@kafisatz, can you share some more details around the slowness you were seeing? Namely:
- Julia/package versions (
versioninfo(true)
)
I tried to dig into this again this morning, but I'm seeing comparable parsing times between my mac and a windows machine on Julia 0.4.1, and latest CSV master (Pkg.checkout("CSV")
)
from csv.jl.
Hi quinn. It is very fast now.
What I do not fully understand is how to get the data in a format which is usable for me as a layman (e.g. an array, a vector of vectors, a dataframe or something similar)
csv.csv takes extremely long compared to readtable (dataframe), see below.
the file I read has 100'000 rows and 77 columns
julia> @time f1=readcsv(f);
3.356507 seconds (27.93 M allocations: 913.321 MB, 5.40% gc time)
julia> @time f2=CSV.read(f);
0.016803 seconds (18 allocations: 41.813 MB)
julia> @time dt=CSV.csv(f,rows_for_type_detect=10000)
164.831213 seconds (25.93 M allocations: 940.161 MB, 0.43% gc time)
julia> @time df=readtable(f);
3.640882 seconds (26.84 M allocations: 969.908 MB)
from csv.jl.
hey @kafisatz, a couple of things here:
- using such a high number for
rows_for_typedectect
will always make it pretty slow, if you're having trouble getting the right types, it's much faster/easier to use thetypes
argument, something liketypes=Dict(1=>Float64)
in order to specify that the first column should be Float64 - There's currently a bug where
CSV.read
is not actually calling in CSV, but just the regular Base.read method, which just reads the file in as an array of bytes; I should probably try to make that an error somehow. - I'll add some more documentation for this, but it's pretty painless to convert the result of
CSV.csv
to a DataFrame, for example:
using DataFrames
using CSV
dt = CSV.csv("myfile.csv")
df = DataFrame(dt) # converts our Data.Table `dt` to a DataFrame without copying
from csv.jl.
Related Issues (20)
- Keyword `decimal` not respected for AbstractFloats in CSV.write()
- Can't transfer CSV.jl v0.10.11 from Windows to Linux HOT 2
- CSV.write somehow cannot write file with name `con.csv` in Windows?! HOT 5
- Add Zenodo badge to README HOT 6
- Segfault on Julia 1.9 on Intel Sapphire Rapids during precompilation
- `bufsize` of `write` is defined to be length of row but actually cells
- can not read the csv with large cells written by itself HOT 1
- Formatting broken on Examples page in documentation HOT 2
- CSV.jl fails to precompile on Ubuntu server, v0.10.5 and up. HOT 2
- Error on CSV.read attempt HOT 4
- `emptyvalue` keyword option
- CSV.Chunks splits file into uneven chunks
- CSV.jl errors on nightly
- Incorrect results for `argmax` with multithreaded parsing HOT 1
- CSV is failing PkgEval HOT 4
- Error when combining single row with multiple row CSV file into a DataFrame with pooling on. HOT 1
- `Date` types should not be inferred from column
- CSV is broken in nightly julia
- 1.12.0-DEV.317 ERROR: LoadError: TypeError: in typeassert, expected Tuple{Vector{UInt8}, Int64, Int64, Union{Nothing, String}}, got a value of type Tuple{Memory{UInt8}, Int64, Int64, Nothing}
- Error when passing as `source` a vector with fewer unique elements than files.
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from csv.jl.