Comments (4)
On current master, I can read the yellow_tripdata file in about 14s. By comparison, TextParse.jl takes about 20s, while pandas takes around 40s.
from csv.jl.
Is there a way you can share the dataset? Even if just privately? Happy to try and take a look profiling on my side and try to post what seems to help.
from csv.jl.
https://s3.amazonaws.com/nyc-tlc/trip+data/yellow_tripdata_2015-01.csv
Try this one, I am also currently working on this one, although this is just one months data, there are 11 more such files and that only comprises the data for 2015. But try this one first.
from csv.jl.
I can shave 20% off the read time by doing @time df = CSV.read("issue51.csv"; types=Dict(2=>WeakRefString{UInt8}, 3=>WeakRefString{UInt8}))
, i.e. specifying the 2nd and 3rd columns should be strings instead of DateTime. This should be fixed in 0.6, however, since DateTime parsing has gotten much faster. I'm trying to get CSV working on 0.6 at the moment, or I would check right now.
from csv.jl.
Related Issues (20)
- "writeshortest not defined" on macOS HOT 1
- UndefVarError: writeshortest not defined HOT 1
- Parsing based on first row when select, header and skipto are provided
- `CSV.io` is not defined
- CSV.File breaks with multiple input CSVs
- Reading large CSV files is slow/crashes HOT 1
- Performance regression since v0.8.0 HOT 1
- `stripwhitespace=true` not removing trailing white space? HOT 1
- Do not edit "N/A", "NA", and similar entries **by default**. HOT 3
- skipto breaks if there is a quote in the skipped rows HOT 3
- getproperty on File makes internal use of dot notation problematic HOT 1
- big integers are parsed as Float64
- Too many missing warnings HOT 3
- writeheader=true ineffective in combination with header=
- Do not convert quoted cells
- CSV.write should conditionally convert type unstable iterators
- [Bug] CSV.read randomly changes eltype of column HOT 7
- pool kwarg documentation HOT 1
- There is no clear method reading non-UTF8 gzipped file in example
- burntsushi's issue HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from csv.jl.