Code Monkey home page Code Monkey logo

Comments (6)

quinnj avatar quinnj commented on July 18, 2024

Should be closed by JuliaLang/METADATA.jl#4014

from csv.jl.

quinnj avatar quinnj commented on July 18, 2024

(i.e. just do a Pkg.update() now and you should have the latest version)

from csv.jl.

kafisatz avatar kafisatz commented on July 18, 2024

@quinnj

This is not resolved:
I tested again today (on two different Win7 machines). Here are the warmed up timings of a 1000 row file:

julia> f="c:\temp\julia1k.csv"
"c:\temp\julia1k.csv"

julia> @time f1=readcsv(f);
0.052015 seconds (239.86 k allocations: 8.536 MB)

julia> @time df=readtable(f);
0.064770 seconds (224.02 k allocations: 10.468 MB, 18.82% gc time)

julia> @time f2=CSV.read(f,rows_for_type_detect=1000);
11.974838 seconds (1.81 M allocations: 74.173 MB, 0.12% gc time)

julia> size(f1)
(1000,77)

julia> size(df)
(999,77)

I apologize to the user @time for pinging her/him

from csv.jl.

quinnj avatar quinnj commented on July 18, 2024

@kafisatz, can you share some more details around the slowness you were seeing? Namely:

  • Julia/package versions (versioninfo(true))

I tried to dig into this again this morning, but I'm seeing comparable parsing times between my mac and a windows machine on Julia 0.4.1, and latest CSV master (Pkg.checkout("CSV"))

from csv.jl.

kafisatz avatar kafisatz commented on July 18, 2024

Hi quinn. It is very fast now.
What I do not fully understand is how to get the data in a format which is usable for me as a layman (e.g. an array, a vector of vectors, a dataframe or something similar)

csv.csv takes extremely long compared to readtable (dataframe), see below.
the file I read has 100'000 rows and 77 columns

julia> @time f1=readcsv(f);
3.356507 seconds (27.93 M allocations: 913.321 MB, 5.40% gc time)

julia> @time f2=CSV.read(f);
0.016803 seconds (18 allocations: 41.813 MB)

julia> @time dt=CSV.csv(f,rows_for_type_detect=10000)
164.831213 seconds (25.93 M allocations: 940.161 MB, 0.43% gc time)

julia> @time df=readtable(f);
3.640882 seconds (26.84 M allocations: 969.908 MB)

from csv.jl.

quinnj avatar quinnj commented on July 18, 2024

hey @kafisatz, a couple of things here:

  • using such a high number for rows_for_typedectect will always make it pretty slow, if you're having trouble getting the right types, it's much faster/easier to use the types argument, something like types=Dict(1=>Float64) in order to specify that the first column should be Float64
  • There's currently a bug where CSV.read is not actually calling in CSV, but just the regular Base.read method, which just reads the file in as an array of bytes; I should probably try to make that an error somehow.
  • I'll add some more documentation for this, but it's pretty painless to convert the result of CSV.csv to a DataFrame, for example:
using DataFrames
using CSV
dt = CSV.csv("myfile.csv")
df = DataFrame(dt)  # converts our Data.Table `dt` to a DataFrame without copying

from csv.jl.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.