Comments (8)
Hmmmm.....we should actually be accounting for quoted strings/other types during type detection. Can you post a specific case (file) where this isn't happening? Definitely a bug.
from csv.jl.
It's attenu.csv.gz
I get
CSV.CSVError("error parsing a `Int64` value on column 3, row 170; encountered 'c'")
from csv.jl.
I've reported this issue and fixed it in #65, but in the master the issue is present once again (the "quoted numbers detected as string column" test that I provided was also modified to expect that the 2nd (quoted) column is Int).
Is it because the logic was changed to ignore the quotes when detecting column types?
Somewhat similar issue is that during type detection quoted nulls ("NA") are detected as nulls.
from csv.jl.
@alyst, sorry for what's happened here. With the port to Nulls, I took the time to do quite a number of large refactorings, involving initial type detection, parsing, streaming, etc. In the process, there were some two dozen other issues closed, and I tried to make sure all existing issues stayed resolved, but I was worried that something would regress.
In this case, the behavior that has changed, and that I'd like to support is that quoted fields are not automatically treated as Strings. The reasoning here is that I've personally encountered several different csv sources where, for some reason or another, a system chooses to quote all fields, regardless of being a string or not, or containing characters needing escaping or not.
In the case of the attenu.csv.gz
file, the correct way to read that file would now be
julia> df = CSV.read(joinpath(dir, "attenu.csv"); null="NA", types=Dict(3=>Union{Null, String}))
i.e. it's pretty easy to manually specify that the 3rd column should be Strings with null values as "NA".
Does that all make sense? Sorry again if this has messed anything up at all.
from csv.jl.
The mode that doesn't automatically treat quoted columns as strings definitely makes sense. But for me it's rather an indication of the problem with the .csv
file.
Maybe it's possible to add an option (say, quoted_values=:string/:detect
) specifying whether to always treat quoted columns as strings or to ignore the quotes and try to detect the type of the value.
from csv.jl.
Re attenu.csv.gz
, the problem is that the column is inferred as non-null, but then null
occurs during actual parsing.
from csv.jl.
FWIW in R, when we remove from attenu
a few rows with non-digits in the 3rd column, read.csv(stringsAsFactors=FALSE)
imports the 3rd column as int and treats quoted NAs as NA.
readr::read_csv()
also imports it as int. But, read_csv()
has quoted_na
option (TRUE by default), which specifies whether to treat "NA"
as NA or as a string (however, for attenu.csv.gz
this option doesn't seem to have any effect, the 3rd column at 79th row is always NA).
from csv.jl.
This should be fixed on master with the switch to CSV.File
(CSV.read
still relies on the old CSV.Source
, but there are plans to switch it over.
Note for now, you can get a NamedTuple of Vectors on master by doing using Tables; table = CSV.File(file; kwargs...) |> columntable
from csv.jl.
Related Issues (20)
- Too many missing warnings HOT 3
- writeheader=true ineffective in combination with header=
- Do not convert quoted cells
- CSV.write should conditionally convert type unstable iterators
- [Bug] CSV.read randomly changes eltype of column HOT 7
- pool kwarg documentation HOT 1
- There is no clear method reading non-UTF8 gzipped file in example
- burntsushi's issue HOT 1
- Multithreaded parsing error should be warning HOT 7
- Error reading CSV - missing lines HOT 2
- Load error with Parsers.Options HOT 4
- Configurable max inline string length
- Precompilation issue in Ubuntu 22.04.2 LTS (libLLVM-14jl.so (unknown line)) HOT 14
- Formatting issues in examples
- Cannot compile this package on Julia 1.9.1 in Ubuntu 22.04 container HOT 3
- "Missing" Values HOT 2
- Keyword `decimal` not respected for AbstractFloats in CSV.write()
- Can't transfer CSV.jl v0.10.11 from Windows to Linux HOT 2
- CSV.write somehow cannot write file with name `con.csv` in Windows?! HOT 5
- Add Zenodo badge to README HOT 6
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from csv.jl.