juliadata / parsers.jl Goto Github PK
View Code? Open in Web Editor NEWfast parsing machinery for basic types in Julia
License: MIT License
fast parsing machinery for basic types in Julia
License: MIT License
From worker 3: │ === EXCEPTION SUMMARY ===
From worker 3: │
From worker 3: │ BoundsError: attempt to access 37-element view(::Vector{UInt8}, 122:158) with eltype UInt8 at index [38]
From worker 3: │ [1] throw_boundserror(A::SubArray{UInt8, 1, Vector{UInt8}, Tuple{UnitRange{Int32}}, true}, I::Tuple{Int64})
From worker 3: │ @ Base ./abstractarray.jl:703
From worker 3: │
From worker 3: │ ===========================
From worker 3: │
From worker 3: │ Original Error message:
From worker 3: │
From worker 3: │ ERROR: BoundsError: attempt to access 37-element view(::Vector{UInt8}, 122:158) with eltype UInt8 at index [38]
From worker 3: │ Stacktrace:
From worker 3: │ [1] throw_boundserror(A::SubArray{UInt8, 1, Vector{UInt8}, Tuple{UnitRange{Int32}}, true}, I::Tuple{Int64})
From worker 3: │ @ Base ./abstractarray.jl:703
From worker 3: │ [2] checkbounds
From worker 3: │ @ ./abstractarray.jl:668 [inlined]
From worker 3: │ [3] getindex
From worker 3: │ @ ./subarray.jl:314 [inlined]
From worker 3: │ [4] peekbyte
From worker 3: │ @ ~/.julia/packages/Parsers/mPY37/src/utils.jl:356 [inlined]
From worker 3: │ [5] typeparser(#unused#::Type{Base.UUID}, source::SubArray{UInt8, 1, Vector{UInt8}, Tuple{UnitRange{Int32}}, true}, pos::Int64, len::Int64, b::UInt8, code::Int16, pl::Parsers.PosLen, options::Parsers.Options)
From worker 3: │ @ Parsers ~/.julia/packages/Parsers/mPY37/src/hexadecimal.jl:94
From worker 3: │ [6] #23
From worker 3: │ @ ~/.julia/packages/Parsers/mPY37/src/components.jl:382 [inlined]
From worker 3: │ [7] checkforsentinel
From worker 3: │ @ ~/.julia/packages/Parsers/mPY37/src/components.jl:244 [inlined]
From worker 3: │ [8] stripwhitespace
From worker 3: │ @ ~/.julia/packages/Parsers/mPY37/src/components.jl:89 [inlined]
From worker 3: │ [9] findquoted
From worker 3: │ @ ~/.julia/packages/Parsers/mPY37/src/components.jl:220 [inlined]
From worker 3: │ [10] stripwhitespace
From worker 3: │ @ ~/.julia/packages/Parsers/mPY37/src/components.jl:89 [inlined]
From worker 3: │ [11] _finddelimiter
From worker 3: │ @ ~/.julia/packages/Parsers/mPY37/src/components.jl:368 [inlined]
From worker 3: │ [12] checkemptysentinel
From worker 3: │ @ ~/.julia/packages/Parsers/mPY37/src/components.jl:40 [inlined]
From worker 3: │ [13] #6
From worker 3: │ @ ~/.julia/packages/Parsers/mPY37/src/components.jl:13 [inlined]
From worker 3: │ [14] _xparse
From worker 3: │ @ ~/.julia/packages/Parsers/mPY37/src/Parsers.jl:343 [inlined]
Looks like #50 didn't fix this completely
julia> CSV.read("/Users/andreasnoack/Desktop/testdot.csv")
1×2 DataFrames.DataFrame
│ Row │ a │ b │
│ │ Float64 │ String │
├─────┼─────────┼────────┤
│ 1 │ 0.0 │ . │
julia> readlines("/Users/andreasnoack/Desktop/testdot.csv")
2-element Array{String,1}:
"a, b"
"., ."
The order of arguments is opposite to Base.parse
. Typically, types are usually the first argument to function in general. Is there a reason for choosing the non-standard order here?
I am investigating this issue JuliaData/CSV.jl#359, which I think comes from Parsers.jl. When running tests on both the released version and master of Parsers.jl on Julia master (currently v"1.1.0-DEV.794"
), I get many errors of the form
Misc: Error During Test at /home/tamas/.julia/dev/Parsers/test/runtests.jl:596
Got exception outside of a @test
unsupported or misplaced expression error
Stacktrace:
[1] top-level scope
[2] match! at /home/tamas/.julia/dev/Parsers/src/tries.jl:137 [inlined]
[3] #parse!#25 at /home/tamas/.julia/dev/Parsers/src/Parsers.jl:588 [inlined]
[4] parse! at /home/tamas/.julia/dev/Parsers/src/Parsers.jl:578 [inlined]
[5] #parse!#23 at /home/tamas/.julia/dev/Parsers/src/Parsers.jl:508 [inlined]
[6] parse! at /home/tamas/.julia/dev/Parsers/src/Parsers.jl:483 [inlined]
[7] #parse!#22 at /home/tamas/.julia/dev/Parsers/src/Parsers.jl:384 [inlined]
[8] parse! at /home/tamas/.julia/dev/Parsers/src/Parsers.jl:384 [inlined]
[9] #parse#11 at /home/tamas/.julia/dev/Parsers/src/Parsers.jl:305 [inlined]
[10] parse(::Parsers.Delimited{false,false,Parsers.Quoted{Parsers.Sentinel{typeof(Parsers.defaultparser),Parsers.Trie{0x00,false,missing,2,Tuple{Parsers.Trie{0x6e,false,missing,2,Tuple{Parsers.Trie{0x75,false,missing,2,Tuple{Parsers.Trie{0x6c,false,missing,2,Tuple{Parsers.Trie{0x6c,true,missing,2,Tuple{}}}}}}}}}}}},Parsers.Trie{0x00,false,missing,8,Tuple{Parsers.Trie{0x2c,true,missing,8,Tuple{}}}}}, ::Base.GenericIOBuffer{Array{UInt8,1}}, ::Type{Int64}) at /home/tamas/.julia/dev/Parsers/src/Parsers.jl:305
[11] top-level scope at /home/tamas/.julia/dev/Parsers/test/runtests.jl:634
[12] top-level scope at /home/tamas/src/julia-git/usr/share/julia/stdlib/v1.1/Test/src/Test.jl:1083
[13] top-level scope at /home/tamas/.julia/dev/Parsers/test/runtests.jl:598
[14] top-level scope at /home/tamas/src/julia-git/usr/share/julia/stdlib/v1.1/Test/src/Test.jl:1083
[15] top-level scope at /home/tamas/.julia/dev/Parsers/test/runtests.jl:7
[16] include at ./boot.jl:317 [inlined]
[17] include_relative(::Module, ::String) at ./loading.jl:1038
[18] include(::Module, ::String) at ./sysimg.jl:29
[19] include(::String) at ./client.jl:403
[20] top-level scope at none:0
[21] eval(::Module, ::Any) at ./boot.jl:319
[22] exec_options(::Base.JLOptions) at ./client.jl:243
[23] _start() at ./client.jl:436
Julia 1.0.2 works fine. Manifest.toml below.
Please let me know if this issue belongs in PackageCompiler.jl
instead of here.
This issue seems to be related to @n8xm's #83 issue, so I'll be using test values from there.
I originally had this crash in a compiled package using JSON.jl
, but the issue seems to come from Parsers.jl
consider:
module Foo
using Parsers
julia_main()::Cint = main()
function main()
try
str_arg = ARGS[1]
@info "Parsing \"$str_arg\" -> $(Parsers.parse(Float64,str_arg))"
catch
Base.invokelatest(Base.display_error, Base.catch_stack())
return 1
end
return 0
end
end
With
$ ./Foo_compiled/bin/Foo "9.3494547075363499E-311"
[ Info: Parsing "9.3494547075363499E-311" -> 0.0
$ ./Foo_compiled/bin/Foo "9.349454707536349E-311"
realloc(): invalid pointer
signal (6): Aborted
in expression starting at none:0
gsignal at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
abort at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
unknown function (ip: 0x7f01ef4ce3ed)
unknown function (ip: 0x7f01ef4d647b)
realloc at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
__gmpz_realloc at /home/user/.julia/dev/Foo_compiled/bin/../lib/julia/libgmp.so (unknown line)
__gmpz_import at /home/user/.julia/dev/Foo_compiled/bin/../lib/julia/libgmp.so (unknown line)
_widen at /home/user/.julia/packages/Parsers/wEs2o/src/floats.jl:13 [inlined]
_scale at /home/user/.julia/packages/Parsers/wEs2o/src/floats.jl:445 [inlined]
_scale at /home/user/.julia/packages/Parsers/wEs2o/src/floats.jl:431 [inlined]
scale at /home/user/.julia/packages/Parsers/wEs2o/src/floats.jl:392 [inlined]
parseexp at /home/user/.julia/packages/Parsers/wEs2o/src/floats.jl:334 [inlined]
parsefrac at /home/user/.julia/packages/Parsers/wEs2o/src/floats.jl:309 [inlined]
parsedigits at /home/user/.julia/packages/Parsers/wEs2o/src/floats.jl:242 [inlined]
typeparser at /home/user/.julia/packages/Parsers/wEs2o/src/floats.jl:170 [inlined]
xparse2 at /home/user/.julia/packages/Parsers/wEs2o/src/Parsers.jl:691 [inlined]
xparse2 at /home/user/.julia/packages/Parsers/wEs2o/src/Parsers.jl:671 [inlined]
parse at /home/user/.julia/packages/Parsers/wEs2o/src/Parsers.jl:170
parse at /home/user/.julia/packages/Parsers/wEs2o/src/Parsers.jl:170 [inlined]
macro expansion at ./logging.jl:340 [inlined]
main at /home/user/.julia/dev/Foo/src/Foo.jl:11
julia_main at /home/user/.julia/dev/Foo/src/Foo.jl:5 [inlined]
julia_main at ./none:36
unknown function (ip: 0x7f01e2949b5c)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2237 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2419
julia_main at /home/user/.julia/dev/Foo_compiled/bin/Foo.so (unknown line)
main at ./Foo_compiled/bin/Foo (unknown line)
__libc_start_main at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
_start at ./Foo_compiled/bin/Foo (unknown line)
Allocations: 2285413 (Pool: 2284512; Big: 901); GC: 3
Aborted (core dumped)
Please let me know if you need additionnal details
Here is a part of the docs for getstring
If the actual parsed String is needed, however, you can pass your source and the res.val::PosLen to Parsers.getstring to get the actual parsed String value.
But getstring
takes 3 arguments, the last one which is called e
. I don't see any description of e
in the docstring and how it should be used. Looking at the tests, it seems that 0x00 is passed in.
Such as InlineStrings.jl, CSV.jl, ...
To avoid issues like #133
We could use a GitHub Action workflow like https://github.com/JuliaDiff/ChainRulesCore.jl/blob/v1.15.6/.github/workflows/IntegrationTest.yml
Parsing floating point numbers with a leading point fails:
julia> Parsers.parse(".24409E+03", Float64)
ERROR: Parsers.Error (INVALID: EOF):
initial value parsing failed, reached EOF
failed to parse Float64, encountered: '\0'
In the same time Base.parse has no problem with this format:
julia> Base.parse(Float64, ".24409E+03")
244.09
julia> str = "{hey, there }"
"{hey, there }"
expected behaviour, with delim=','
(and default wh1, wh2
):
julia> res = Parsers.xparse(String, str; openquotechar='{', closequotechar='}', stripquoted=true, delim=',')
Parsers.Result{PosLen}(37, 13, PosLen(0x000000000020000a))
julia> Parsers.getstring(str, res.val, 0x22)
"hey, there"
unexpected behaviour, with delim=' '
(and wh1
changed, since its required):
julia> res = Parsers.xparse(String, str; openquotechar='{', closequotechar='}', stripquoted=true, delim=' ', wh1=0x00)
Parsers.Result{PosLen}(37, 13, PosLen(0x000000000020000b))
julia> Parsers.getstring(str, res.val, 0x22) # has trailing whitespace
"hey, there "
The trailing whitespace in the quoted string isn't striped in the second case, because we explicitly set wh1
to a value that wasn't ' '
due to ' '
being the delimiter... but there's no way to tell Parsers "inside quotes treat ' '
as whitespace not the delimiter, just like you treat ','
as a regular character inside quotes even when comma is the delimiter"
One way to fix this would be to hardcode certain characters as always being whitespace when quoted e.g.
- if options.stripquoted && b != options.wh1 && b != options.wh2
+ if options.stripquoted && b != options.wh1 && b != options.wh2 && b != UInt8(' ') && b != UInt8('\t')
lastnonwhitespacepos = pos
end
Lines 100 to 102 in 462fb55
Trying to read a csv file with a char column throws error:
df = DataFrame(a = [1,2,3], b = ['q','w','e'])
CSV.write("test.csv", df)
CSV.read("test.csv", types = [Int, Char]) # ERROR: MethodError: no method matching zero(::Type{Char})
I use Base.parse
, I get different results than if I use Parsers.parse
:
julia> parse(Float64,"9.3494547075363499E-311")
9.3494547075363e-311
julia> parse(Float64,"9.349454707536349E-311")
9.3494547075363e-311
julia> using Parsers
julia> Parsers.parse(Float64,"9.3494547075363499E-311")
0.0
julia> Parsers.parse(Float64,"9.349454707536349E-311")
9.3494547075363e-311
I believe what is happening is that:
It is my understanding that subnormal floats sacrifice precision in order to allow representations that are "close" to zero. However, because 9.3494547075363499E-311
is larger than the smallest subnormal float, the behavior of Base.parse
is what I would expect. I would only expect a parser to return 0.0
if the number being parsed is smaller than the smallest subnormal float.
By the way, Python's built-in parser behaves like Base.parse
and not Parsers.parse
. Using Python 3.9.6:
>>> float("9.3494547075363499E-311")
9.3494547075363e-311
>>> float("9.349454707536349E-311")
9.3494547075363e-311
Similarly, C's atof
behaves like Base.parse
and not Parsers.parse
:
printf("%e\n", atof("9.3494547075363499E-311"));
printf("%e\n", atof("9.349454707536349E-311"));
The above C source produces the output:
9.349455e-311
9.349455e-311
Is there a good reason why Parsers.parse
behaves differently from Base.parse
in this case?
OpenML depends on ARFFFiles.jl, which in turn depends on Parsers.jl
No error is thrown below if I pin Parsers to 2.4:
julia> OpenML.load(42638)
ERROR: Invalid nominal "\"C85\"" in column 'cabin' of row 2, expecting one of "B45", "E31", "B57 B59 B63 B66", "B36", "A21", "C78", "D34", "D19", "A9", "D15", "D56", "C103", "C123", "C31", "C23 C25 C27", "F G63", "B61", "C53", "D43", "C130", "C132", "C101", "C55 C57", "B71", "C46", "C116", "F", "A29", "G6", "C6", "C28", "C51", "E46", "C54", "C97", "D22", "B10", "F4", "E45", "E52", "D30", "B58 B60", "E34", "C62 C64", "A11", "B11", "C80", "F33", "C85", "D37", "C86", "D21", "C89", "F E46", "A34", "D", "B26", "C22 C26", "B69", "C32", "B78", "F E57", "F2", "A18", "C106", "B51 B53 B55", "D10 D12", "E60", "E50", "E39 E41", "B52 B54 B56", "C39", "B24", "D28", "B41", "C7", "D40", "D38", "C105", "A6", "D33", "B30", "C52", "B28", "C83", "F G73", "A5", "D26", "C110", "E101", "F E69", "D47", "B86", "C2", "E33", "B19", "A7", "C49", "A32", "B4", "B80", "A31", "D36", "C93", "D35", "C87", "B77", "E67", "B94", "C125", "C99", "C118", "D7", "A19", "B49", "C65", "E36", "B18", "C124", "C91", "E40", "T", "C128", "B35", "C82", "B96 B98", "E10", "E44", "C104", "C111", "C92", "E38", "E12", "E63", "A14", "B37", "C30", "D20", "B79", "E25", "D46", "B73", "C95", "B38", "B39", "B22", "C70", "A16", "C68", "A10", "E68", "A20", "D50", "D9", "A23", "B50", "A26", "D48", "E58", "C126", "D49", "B5", "B20", "E24", "C90", "C45", "E8", "B101", "D45", "E121", "D11", "E77", "F38", "B3", "D6", "B82 B84", "D17", "A36", "B102", "E49", "C47", "E17", "A24", "C50", "B42" or "C148"
Stacktrace:
[1] error(s::String)
@ Base ./error.jl:35
[2] _readcolumns_readdatum
@ ~/.julia/packages/ARFFFiles/o3ClW/src/ARFFFiles.jl:965 [inlined]
[3] _readcolumns_readdatum
@ ~/.julia/packages/ARFFFiles/o3ClW/src/ARFFFiles.jl:900 [inlined]
[4] readcolumns(r::ARFFFiles.ARFFReader{IOStream}; opts_sq::Parsers.Options, opts_dq::Parsers.Options, date_opts_sq::Vector{Parsers.Options}, date_opts_dq::Vector{Parsers.Options}, maxbytes::Nothing, chunkbytes::Int64)
@ ARFFFiles ~/.julia/packages/ARFFFiles/o3ClW/src/ARFFFiles.jl:847
[5] #3
@ ~/.julia/packages/OpenML/dTbTl/src/data.jl:87 [inlined]
[6] load(::OpenML.var"#3#5"{Nothing}, ::String; opts::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
@ ARFFFiles ~/.julia/packages/ARFFFiles/o3ClW/src/ARFFFiles.jl:577
[7] load
@ ~/.julia/packages/ARFFFiles/o3ClW/src/ARFFFiles.jl:574 [inlined]
[8] load(id::Int64; maxbytes::Nothing)
@ OpenML ~/.julia/packages/OpenML/dTbTl/src/data.jl:87
[9] load(id::Int64)
@ OpenML ~/.julia/packages/OpenML/dTbTl/src/data.jl:69
[10] top-level scope
@ REPL[7]:1
Here I'm running in an env that has only OpenML 0.3.0 in it.
In both my working and failing environments, the version of ARFFFiles.jl is the same, namely 1.4.1.
❯ julia --check-bounds=yes --compiled-modules=no
julia> using Parsers
julia> Parsers.checkdelim!(codeunits("::::"), 1, 4, Parsers.Options(delim = "::", ignorerepeated = true)) == 5
ERROR: BoundsError: attempt to access 4-codeunit String at index [5]
Stacktrace:
[1] checkbounds
@ ./strings/basic.jl:216 [inlined]
[2] codeunit
@ ./strings/string.jl:117 [inlined]
[3] getindex
@ ./strings/basic.jl:756 [inlined]
[4] peekbyte
@ ~/.julia/packages/Parsers/gi2J3/src/utils.jl:345 [inlined]
[5] checkdelim!(source::Base.CodeUnits{UInt8, String}, pos::Int64, len::Int64, options::Parsers.Options)
@ Parsers ~/.julia/packages/Parsers/gi2J3/src/Parsers.jl:375
[6] top-level scope
@ REPL[2]:1
(running with --compiled-modules=no
to make sure no precompile file with bounds checking turned off is cached)
julia> using Parsers
julia> Parsers.parse(Int64, "-9223372036854775807")
9223372036854775807
julia> Parsers.parse(Int64, "-9223372036854775807\t")
9223372036854775807
Obviously the sign is missing. This was described in slack.
I will submit a fix in a new PR immediately.
For quoted strings, I'd expect that two consecutive quotes represent an empty string (a known value), but I get a missing value instead:
julia> Parsers.xparse(String, "\"\",", 1, 3, Parsers.Options(quoted=true, sentinel=missing)).val.missingvalue
true
Having this would be helpful for CSV parsers which want to differentiate between unknown/missing strings and empty strings (like e.g. Postres does):
"a","","c" # middle field holds an empty string
"a",,"c" # missing value in the middle field
When input is an IO object, Date parser always returns Date(1)
.
julia> using Dates
julia> Parsers.parse(Date, "2020-07-30")
2020-07-30
julia> Parsers.parse(Date, IOBuffer("2020-07-30"))
0001-01-01
This issue is used to trigger TagBot; feel free to unsubscribe.
If you haven't already, you should update your TagBot.yml
to include issue comment triggers.
Please see this post on Discourse for instructions and more details.
From JuliaData/CSV.jl#397
Base.parse
works correctly but Parsers.parse
failed:
julia> Base.parse(Float64, "74810199.033988851037472901827191090834")
7.481019903398885e7
julia> Parsers.parse(Float64, "74810199.033988851037472901827191090834")
ERROR: InexactError: check_top_bit(Int64, -3)
Stacktrace:
[1] throw_inexacterror(::Symbol, ::Any, ::Int64) at ./boot.jl:583
[2] check_top_bit at ./boot.jl:597 [inlined]
[3] toUInt64 at ./boot.jl:708 [inlined]
[4] Type at ./boot.jl:738 [inlined]
[5] convert at ./number.jl:7 [inlined]
[6] cconvert at ./essentials.jl:355 [inlined]
[7] mul_2exp! at ./gmp.jl:146 [inlined]
[8] mul_2exp! at ./gmp.jl:148 [inlined]
[9] scale at /Users/tomkwong/.julia/packages/Parsers/v5u2B/src/floats.jl:96 [inlined]
[10] scale at /Users/tomkwong/.julia/packages/Parsers/v5u2B/src/floats.jl:114 [inlined]
[11] #_defaultparser#46(::UInt8, ::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::Function, ::Parsers.StringBuffer, ::Parsers.Result{Float64}, ::Type{Int128}) at /Users/tomkwong/.julia/packages/Parsers/v5u2B/src/floats.jl:196
[12] (::getfield(Parsers, Symbol("#kw##_defaultparser")))(::NamedTuple{(:decimal,),Tuple{UInt8}}, ::typeof(Parsers._defaultparser), ::Parsers.StringBuffer, ::Parsers.Result{Float64}, ::Type{Int128}) at ./none:0
[13] #_defaultparser#46 at /Users/tomkwong/.julia/packages/Parsers/v5u2B/src/floats.jl:194 [inlined]
[14] _defaultparser at /Users/tomkwong/.julia/packages/Parsers/v5u2B/src/floats.jl:126 [inlined]
[15] #defaultparser#45 at /Users/tomkwong/.julia/packages/Parsers/v5u2B/src/floats.jl:123 [inlined]
[16] defaultparser at /Users/tomkwong/.julia/packages/Parsers/v5u2B/src/floats.jl:123 [inlined]
[17] #parse!#13 at /Users/tomkwong/.julia/packages/Parsers/v5u2B/src/Parsers.jl:349 [inlined]
[18] parse! at /Users/tomkwong/.julia/packages/Parsers/v5u2B/src/Parsers.jl:349 [inlined]
[19] #parse#15 at /Users/tomkwong/.julia/packages/Parsers/v5u2B/src/Parsers.jl:351 [inlined]
[20] parse at /Users/tomkwong/.julia/packages/Parsers/v5u2B/src/Parsers.jl:351 [inlined]
[21] #parse#16(::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::Function, ::Type{Float64}, ::String) at /Users/tomkwong/.julia/packages/Parsers/v5u2B/src/Parsers.jl:362
[22] parse(::Type{Float64}, ::String) at /Users/tomkwong/.julia/packages/Parsers/v5u2B/src/Parsers.jl:361
Lines 20 to 25 in 593c004
Maybe you want to try and upstream those definitions into Base?
I am new to Julia and still figuring things out but it seems that the recent changes have broken parse for tab delimited text (under Julia 1.1):
julia> Parsers.parse(Float64,"12.34\t5.0",Parsers.Options(delim='\t'))
ERROR: Parsers.Error (INVALID: OK | EOF | INVALID_DELIMITER ):initial value parsing succeeded, reached EOF, invalid delimiterattempted to parse Float64 from: "12.34\t5.0"
Stacktrace:
[1] #parse#4(::Int64, ::Int64, ::Function, ::Type{Float64}, ::String, ::Parsers.Options{false,false,false,Nothing,UInt8,Nothing}) at C:\Users\smury.julia\packages\Parsers\bkn21\src\Parsers.jl:109
[2] parse(::Type{Float64}, ::String, ::Parsers.Options{false,false,false,Nothing,UInt8,Nothing}) at C:\Users\smury.julia\packages\Parsers\bkn21\src\Parsers.jl:108
[3] top-level scope at none:0
but
julia> Parsers.parse(Float64,"12.34,5.0",Parsers.Options(delim=','))
12.34
Example:
jl> Parsers.parse(String, "a")
1
Not sure what's the intention for the parse-to-string API, but this is definitely a confusing result.
Hi there,
I'm parsing some data that has been written out from SAS.
Date("25JUL1985", "dduuuyyyy")
works, but
Parsers.parse(Date, "25JUL1985", Parsers.Options(dateformat="dduuuyyyy"))
doesn't.
Any ideas?
I am working with strings interspersed with letters. I'd like to parse numbers up to either the next letter or the end of the string. I can do this with xparse
:
julia> Parsers.xparse(Int, "1234Q", 1, 5)
(1234, -32607, 1, 5, 5)
If a parse fails, the first tuple element x
holds the result I'm looking for. However, it is not documented whether this is intentional behavior (that the x
is correct up to the moment of the parse failing).
I'm curious – is this a sanctioned use of Parsers.xparse
? The docs state that "x
is a value of type T
, even if parsing does not succeed" but do not say that the answer is guaranteed to be correct.
If it is guaranteed, I'd be happy to submit a docs PR. Thanks for your work on this package!
Hey @quinnj,
I'm rewriting our data loading at the moment, migrating to Parsers.jl
.
My request is more or less the opposite of #106: For CSV parsing, it would be great to provide an option that allows us strip whitespaces around unquoted fields, but leave it within quotes.
For example, a CSV
A, B , C,D
"hello", "good day" , " same same " , whatever
should Ideally parse into ["A", "B", "C", "D"]
for the first line and
["hello", "good day", " same same ", "whatever"]`
for the second.
Would it be straightforward to add that as an option?
In #153, we removed JET from tests as it was erroring even with supported Julia versions, and was generally difficult to keep the nightly tests green, as JET relies on various implementation details that sometimes change between Julia nightly versions. We should re-enable it in some form in the future, perhaps as an independent CI action.
This originated from rofinn/FilePathsBase.jl#100
I'm using CSV.File
to read a csv file where some of the columns contain file paths. So I figured I'd do this:
types = Dict(:name => typeof(Path()))
CSV.File(file, types = types)
But I'm getting this error:
ERROR: MethodError: no method matching zero(::Type{FilePathsBase.PosixPath})
Closest candidates are:
zero(::Type{LibGit2.GitHash}) at /build/julia/src/julia-1.5.0/usr/share/julia/stdlib/v1.5/LibGit2/src/oid.jl:220
zero(::Type{Missing}) at missing.jl:103
zero(::Type{Dates.Date}) at /build/julia/src/julia-1.5.0/usr/share/julia/stdlib/v1.5/Dates/src/types.jl:405
...
Stacktrace:
[1] xparse at /home/yakir/.julia/packages/Parsers/DAskp/src/Parsers.jl:752 [inlined]
[2] parsevalue!(::Type{FilePathsBase.PosixPath}, ::UInt8, ::SentinelArrays.SentinelArray{FilePathsBase.PosixPath,1,UndefInitializer,Missing,Array{FilePathsBase.PosixPath,1}}, ::Array{AbstractArray{T,1} where T,1}, ::Array{UInt8,1}, ::Int64, ::Int64, ::Parsers.Options{false,false,true,false,Missing,UInt8,Nothing}, ::Int64, ::Int64, ::Int64, ::Array{Type,1}, ::Array{UInt8,1}) at /home/yakir/.julia/packages/CSV/MKemC/src/file.jl:914
[3] macro expansion at /home/yakir/.julia/packages/CSV/MKemC/src/file.jl:634 [inlined]
[4] parsecustom! at /home/yakir/.julia/packages/CSV/MKemC/src/file.jl:624 [inlined]
[5] parserow at /home/yakir/.julia/packages/CSV/MKemC/src/file.jl:683 [inlined]
[6] parsefilechunk!(::Val{false}, ::Int64, ::Dict{Type,Type}, ::Array{AbstractArray{T,1} where T,1}, ::Array{UInt8,1}, ::Int64, ::Int64, ::Int64, ::Array{Int64,1}, ::Float64, ::Array{CSV.RefPool,1}, ::Int64, ::Int64, ::Array{Type,1}, ::Array{UInt8,1}, ::Bool, ::Parsers.Options{false,false,true,false,Missing,UInt8,Nothing}, ::Nothing, ::Type{Tuple{Tuple{SentinelArrays.SentinelArray{FilePathsBase.PosixPath,1,UndefInitializer,Missing,Array{FilePathsBase.PosixPath,1}},FilePathsBase.PosixPath},Tuple{SentinelArrays.SentinelArray{FilePathsBase.PosixPath,1,UndefInitializer,Missing,Array{FilePathsBase.PosixPath,1}},FilePathsBase.PosixPath},Tuple{SentinelArrays.SentinelArray{FilePathsBase.PosixPath,1,UndefInitializer,Missing,Array{FilePathsBase.PosixPath,1}},FilePathsBase.PosixPath}}}) at /home/yakir/.julia/packages/CSV/MKemC/src/file.jl:557
[7] CSV.File(::CSV.Header{false,Parsers.Options{false,false,true,false,Missing,UInt8,Nothing},Array{UInt8,1}}; startingbyteposition::Nothing, endingbyteposition::Nothing, limit::Nothing, threaded::Nothing, typemap::Dict{Type,Type}, tasks::Int64, debug::Bool) at /home/yakir/.julia/packages/CSV/MKemC/src/file.jl:265
[8] CSV.File(::String; header::Int64, normalizenames::Bool, datarow::Int64, skipto::Nothing, footerskip::Int64, transpose::Bool, comment::Nothing, use_mmap::Nothing, ignoreemptylines::Bool, select::Nothing, drop::Nothing, missingstrings::Array{String,1}, missingstring::String, delim::Nothing, ignorerepeated::Bool, quotechar::Char, openquotechar::Nothing, closequotechar::Nothing, escapechar::Char, dateformat::Nothing, dateformats::Nothing, decimal::UInt8, truestrings::Array{String,1}, falsestrings::Array{String,1}, type::Nothing, types::Dict{Symbol,DataType}, typemap::Dict{Type,Type}, categorical::Nothing, pool::Float64, lazystrings::Bool, strict::Bool, silencewarnings::Bool, debug::Bool, parsingdebug::Bool, kw::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at /home/yakir/.julia/packages/CSV/MKemC/src/file.jl:217
[9] loadcsv(::String) at /home/yakir/MAT2db.jl/src/MAT2db.jl:31
[10] process_csv(::String) at /home/yakir/MAT2db.jl/src/MAT2db.jl:43
[11] top-level scope at ./timing.jl:174 [inlined]
[12] top-level scope at ./REPL[4]:0
[13] run_repl(::REPL.AbstractREPL, ::Any) at /build/julia/src/julia-1.5.0/usr/share/julia/stdlib/v1.5/REPL/src/REPL.jl:288
The tag name "0.v.21" is not of the appropriate SemVer form (vX.Y.Z).
cc: @quinnj
Parsers.jl is a large source TTFX in the Julia ecosystem.
The previous precompilation PR to #108 resolve this was reverted due to bugs in base precompilation on windows JuliaData/CSV.jl#994, so the problem remains.
This issue is to track the problem explicitly. My main questions are:
Two major sources seem to be the Union
type fields in Options
, and the size of the Vector constants being inlined everywhere to set float precision. Moving these to a non-inlined methods for the first saves a lot of compilation time (easy), using stable stuct for options with boolean check values instead of runtime isa
helps the second.
Using the unreleased version of Parsers.jl post-#127 the following test cases from InlineStrings.jl fail (these are cut down from the InlineStrings.jl tests, to see the failures in the context of the full testset see https://github.com/JuliaData/Parsers.jl/actions/runs/3300589688/jobs/5445241946).
I think the first of these might be considered a bug in Parsers.jl, the rest i really don't know. they may all be fine, but i think worth reviewing before we make a new release (to decide if it should be marked breaking and/or how to update InlineStrings.jl)
# test/fails.jl
using InlineStrings, Test, Parsers
using Parsers: SENTINEL, OK, EOF, OVERFLOW, QUOTED, DELIMITED, INVALID_DELIMITER, INVALID_QUOTED_FIELD, ESCAPED_STRING, NEWLINE, SUCCESS
@testset begin
testcases = [
# Failure due to parsing to a different value!
("\"a", InlineString7(), NamedTuple(), OK | QUOTED | INVALID_QUOTED_FIELD | EOF), # invalid quoted
# Failure due to added ESCAPED_STRING code
("\"\\", InlineString7(), (; escapechar=UInt8('\\')), OK | QUOTED | INVALID_QUOTED_FIELD | EOF), # \\ e, invalid quoted
# Failure due to added OK code
("NA", InlineString7(), (; sentinel=["NA"]), EOF | SENTINEL), # sentinel
# Failures due to no EOF code
("\"\",", InlineString7(), NamedTuple(), OK | QUOTED | EOF | DELIMITED), # same e & cq
("\"a\",", InlineString7("a"), NamedTuple(), OK | QUOTED | EOF | DELIMITED), # quoted
("a,", InlineString7("a"), NamedTuple(), OK | EOF | DELIMITED),
("a__", InlineString7("a"), (; delim="__"), OK | EOF | DELIMITED),
("a,", InlineString7("a"), (; ignorerepeated=true), OK | EOF | DELIMITED),
("a__", InlineString7("a"), (; delim="__", ignorerepeated=true), OK | EOF | DELIMITED),
]
for (i, case) in enumerate(testcases)
println("\n---")
println("testing case = $i")
buf, check, opts, checkcode = case
res = Parsers.xparse(InlineString7, buf; opts...)
@show buf
if !(check== res.val)
@show check
@show res.val
end
@test check === res.val
if !(checkcode == res.code)
@show Parsers.codes(checkcode)
@show Parsers.codes(res.code)
end
@test checkcode == res.code
end
end
julia> include("test/fails.jl")
---
testing case = 1
buf = "\"a"
check = ""
res.val = "a"
test set: Test Failed at /Users/nickr/repos/InlineStrings.jl/test/fails.jl:34
Expression: check === res.val
Evaluated: "" === "a"
Stacktrace:
[1] macro expansion
@ /Applications/Julia-1.8.app/Contents/Resources/julia/share/julia/stdlib/v1.8/Test/src/Test.jl:464 [inlined]
[2] macro expansion
@ ~/repos/InlineStrings.jl/test/fails.jl:34 [inlined]
[3] macro expansion
@ /Applications/Julia-1.8.app/Contents/Resources/julia/share/julia/stdlib/v1.8/Test/src/Test.jl:1357 [inlined]
[4] top-level scope
@ ~/repos/InlineStrings.jl/test/fails.jl:7
---
testing case = 2
buf = "\"\\"
Parsers.codes(checkcode) = "INVALID: OK | QUOTED | EOF | INVALID_QUOTED_FIELD "
Parsers.codes(res.code) = "INVALID: OK | QUOTED | ESCAPED_STRING | EOF | INVALID_QUOTED_FIELD "
test set: Test Failed at /Users/nickr/repos/InlineStrings.jl/test/fails.jl:40
Expression: checkcode == res.code
Evaluated: -32667 == -32155
Stacktrace:
[1] macro expansion
@ /Applications/Julia-1.8.app/Contents/Resources/julia/share/julia/stdlib/v1.8/Test/src/Test.jl:464 [inlined]
[2] macro expansion
@ ~/repos/InlineStrings.jl/test/fails.jl:40 [inlined]
[3] macro expansion
@ /Applications/Julia-1.8.app/Contents/Resources/julia/share/julia/stdlib/v1.8/Test/src/Test.jl:1357 [inlined]
[4] top-level scope
@ ~/repos/InlineStrings.jl/test/fails.jl:7
---
testing case = 3
buf = "NA"
Parsers.codes(checkcode) = "SUCCESS: SENTINEL | EOF "
Parsers.codes(res.code) = "SUCCESS: OK | SENTINEL | EOF "
test set: Test Failed at /Users/nickr/repos/InlineStrings.jl/test/fails.jl:40
Expression: checkcode == res.code
Evaluated: 34 == 35
Stacktrace:
[1] macro expansion
@ /Applications/Julia-1.8.app/Contents/Resources/julia/share/julia/stdlib/v1.8/Test/src/Test.jl:464 [inlined]
[2] macro expansion
@ ~/repos/InlineStrings.jl/test/fails.jl:40 [inlined]
[3] macro expansion
@ /Applications/Julia-1.8.app/Contents/Resources/julia/share/julia/stdlib/v1.8/Test/src/Test.jl:1357 [inlined]
[4] top-level scope
@ ~/repos/InlineStrings.jl/test/fails.jl:7
---
testing case = 4
buf = "\"\","
Parsers.codes(checkcode) = "SUCCESS: OK | QUOTED | DELIMITED | EOF "
Parsers.codes(res.code) = "SUCCESS: OK | QUOTED | DELIMITED "
test set: Test Failed at /Users/nickr/repos/InlineStrings.jl/test/fails.jl:40
Expression: checkcode == res.code
Evaluated: 45 == 13
Stacktrace:
[1] macro expansion
@ /Applications/Julia-1.8.app/Contents/Resources/julia/share/julia/stdlib/v1.8/Test/src/Test.jl:464 [inlined]
[2] macro expansion
@ ~/repos/InlineStrings.jl/test/fails.jl:40 [inlined]
[3] macro expansion
@ /Applications/Julia-1.8.app/Contents/Resources/julia/share/julia/stdlib/v1.8/Test/src/Test.jl:1357 [inlined]
[4] top-level scope
@ ~/repos/InlineStrings.jl/test/fails.jl:7
---
testing case = 5
buf = "\"a\","
Parsers.codes(checkcode) = "SUCCESS: OK | QUOTED | DELIMITED | EOF "
Parsers.codes(res.code) = "SUCCESS: OK | QUOTED | DELIMITED "
test set: Test Failed at /Users/nickr/repos/InlineStrings.jl/test/fails.jl:40
Expression: checkcode == res.code
Evaluated: 45 == 13
Stacktrace:
[1] macro expansion
@ /Applications/Julia-1.8.app/Contents/Resources/julia/share/julia/stdlib/v1.8/Test/src/Test.jl:464 [inlined]
[2] macro expansion
@ ~/repos/InlineStrings.jl/test/fails.jl:40 [inlined]
[3] macro expansion
@ /Applications/Julia-1.8.app/Contents/Resources/julia/share/julia/stdlib/v1.8/Test/src/Test.jl:1357 [inlined]
[4] top-level scope
@ ~/repos/InlineStrings.jl/test/fails.jl:7
---
testing case = 6
buf = "a,"
Parsers.codes(checkcode) = "SUCCESS: OK | DELIMITED | EOF "
Parsers.codes(res.code) = "SUCCESS: OK | DELIMITED "
test set: Test Failed at /Users/nickr/repos/InlineStrings.jl/test/fails.jl:40
Expression: checkcode == res.code
Evaluated: 41 == 9
Stacktrace:
[1] macro expansion
@ /Applications/Julia-1.8.app/Contents/Resources/julia/share/julia/stdlib/v1.8/Test/src/Test.jl:464 [inlined]
[2] macro expansion
@ ~/repos/InlineStrings.jl/test/fails.jl:40 [inlined]
[3] macro expansion
@ /Applications/Julia-1.8.app/Contents/Resources/julia/share/julia/stdlib/v1.8/Test/src/Test.jl:1357 [inlined]
[4] top-level scope
@ ~/repos/InlineStrings.jl/test/fails.jl:7
---
testing case = 7
buf = "a__"
Parsers.codes(checkcode) = "SUCCESS: OK | DELIMITED | EOF "
Parsers.codes(res.code) = "SUCCESS: OK | DELIMITED "
test set: Test Failed at /Users/nickr/repos/InlineStrings.jl/test/fails.jl:40
Expression: checkcode == res.code
Evaluated: 41 == 9
Stacktrace:
[1] macro expansion
@ /Applications/Julia-1.8.app/Contents/Resources/julia/share/julia/stdlib/v1.8/Test/src/Test.jl:464 [inlined]
[2] macro expansion
@ ~/repos/InlineStrings.jl/test/fails.jl:40 [inlined]
[3] macro expansion
@ /Applications/Julia-1.8.app/Contents/Resources/julia/share/julia/stdlib/v1.8/Test/src/Test.jl:1357 [inlined]
[4] top-level scope
@ ~/repos/InlineStrings.jl/test/fails.jl:7
---
testing case = 8
buf = "a,"
Parsers.codes(checkcode) = "SUCCESS: OK | DELIMITED | EOF "
Parsers.codes(res.code) = "SUCCESS: OK | DELIMITED "
test set: Test Failed at /Users/nickr/repos/InlineStrings.jl/test/fails.jl:40
Expression: checkcode == res.code
Evaluated: 41 == 9
Stacktrace:
[1] macro expansion
@ /Applications/Julia-1.8.app/Contents/Resources/julia/share/julia/stdlib/v1.8/Test/src/Test.jl:464 [inlined]
[2] macro expansion
@ ~/repos/InlineStrings.jl/test/fails.jl:40 [inlined]
[3] macro expansion
@ /Applications/Julia-1.8.app/Contents/Resources/julia/share/julia/stdlib/v1.8/Test/src/Test.jl:1357 [inlined]
[4] top-level scope
@ ~/repos/InlineStrings.jl/test/fails.jl:7
---
testing case = 9
buf = "a__"
Parsers.codes(checkcode) = "SUCCESS: OK | DELIMITED | EOF "
Parsers.codes(res.code) = "SUCCESS: OK | DELIMITED "
test set: Test Failed at /Users/nickr/repos/InlineStrings.jl/test/fails.jl:40
Expression: checkcode == res.code
Evaluated: 41 == 9
Stacktrace:
[1] macro expansion
@ /Applications/Julia-1.8.app/Contents/Resources/julia/share/julia/stdlib/v1.8/Test/src/Test.jl:464 [inlined]
[2] macro expansion
@ ~/repos/InlineStrings.jl/test/fails.jl:40 [inlined]
[3] macro expansion
@ /Applications/Julia-1.8.app/Contents/Resources/julia/share/julia/stdlib/v1.8/Test/src/Test.jl:1357 [inlined]
[4] top-level scope
@ ~/repos/InlineStrings.jl/test/fails.jl:7
Test Summary: | Pass Fail Total Time
test set | 9 9 18 0.0s
ERROR: LoadError: Some tests did not pass: 9 passed, 9 failed, 0 errored, 0 broken.
in expression starting at /Users/nickr/repos/InlineStrings.jl/test/fails.jl:6
I tried changing some code so that it parses integer data into the smallest Integer type that will fit it, rather than Int64
(i know the integers will be small, e.g. if i know they'll be single-digit integers 0-9, i set it to use Int8
, and so on).
And i saw the parsing time increase compared to parsing them as Int64
...
Is it expected that parsing integer data into Int
will be faster than into smaller Integer types?
This tiny benchmark, seems to match what i see:
julia> using Parsers, BenchmarkTools
julia> buf = Vector{UInt8}("123");
julia> pos, len = 1, 3;
julia> opts = Parsers.Options()
Parsers.Options([""], nothing, false, false, 0x20, 0x09, false, 0x22, 0x22, 0x22, nothing, 0x2e, nothing, nothing, nothing, nothing)
julia> @btime Parsers.xparse(Int64, buf, pos, len, opts);
149.694 ns (1 allocation: 32 bytes)
julia> @btime Parsers.xparse(Int32, buf, pos, len, opts);
155.153 ns (1 allocation: 32 bytes)
julia> @btime Parsers.xparse(Int16, buf, pos, len, opts);
150.374 ns (1 allocation: 32 bytes)
julia> @btime Parsers.xparse(Int8, buf, pos, len, opts);
152.745 ns (1 allocation: 32 bytes)
(jl_41hlU2) pkg> st
Status `/private/var/folders/hx/1h0bbkfd18d4n1qrnwmrl4j00000gn/T/jl_41hlU2/Project.toml`
[6e4b80f9] BenchmarkTools v1.2.0
[69de0a69] Parsers v2.0.3
And it seems to scale up similarly, e.g. if using CSV.jl to read in some data that's all integers:
julia> using CSV, BenchmarkTools
julia> @btime CSV.File("ints.csv", types=Int64, delim=',');
160.968 μs (378 allocations: 22.48 KiB)
julia> @btime CSV.File("ints.csv", types=Int32, delim=',');
178.616 μs (342 allocations: 20.89 KiB)
julia> @btime CSV.File("ints.csv", types=Int16, delim=',');
173.254 μs (342 allocations: 19.62 KiB)
julia> @btime CSV.File("ints.csv", types=Int8, delim=',');
170.279 μs (342 allocations: 18.92 KiB)
(jl_zrE6yO) pkg> st
Status `/private/var/folders/hx/1h0bbkfd18d4n1qrnwmrl4j00000gn/T/jl_zrE6yO/Project.toml`
[6e4b80f9] BenchmarkTools v1.2.0
[336ed68f] CSV v0.9.3
julia> versioninfo()
Julia Version 1.6.2
Commit 1b93d53fc4 (2021-07-14 15:36 UTC)
Platform Info:
OS: macOS (x86_64-apple-darwin18.7.0)
CPU: Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-11.0.1 (ORCJIT, skylake)
I think 3 times recently I have gotten into a confused state where I have added the package master branch (instead of the current default branch) or checked it out locally in git. Then things are super weird for a while until I realize I am on the wrong branch. Perhaps the old default branch should just be removed?
with stripwhitespace = true
(xref #105), i would expect quoted strings to have both leading and trailing whitespace stripped, but when the quoted string is followed by irrelevant characters the trailing whitespace is left:
Setup:
julia> using Parsers # v2.2.0
julia> using InlineStrings # InlineStrings here just so easier to see the result than with String
julia> opts = Parsers.Options(
stripwhitespace=true,
quoted=true,
openquotechar='\'',
closequotechar='\'',
sentinel=missing,
delim=',',
);
Works as expected (gives ABC
):
julia> buf = b"'ABC '";
julia> res = Parsers.xparse(InlineString7, buf, 1, length(buf), opts)
Parsers.Result{String7}(37, 6, "ABC")
With random trailing characters after the quoted string, leaves trailing whitespace (gives ABC
):
julia> buf = b"'ABC ' **";
julia> res = Parsers.xparse(InlineString7, buf, 1, length(buf), opts)
Parsers.Result{String7}(37, 9, "ABC ")
--
context is nickrobinson251/PowerFlowData.jl#44 / nickrobinson251/PowerFlowData.jl#62
I failed to build a website using quarto using Parsers v2.5.7 with the following error message:
An error occurred while executing the following cell:
------------------
using CSV
using DataFrames
csv_path = joinpath("working_directory", "data", "scope_0.csv")
df = CSV.read(csv_path, DataFrame, header = [1,2])
------------------
MethodError: no method matching iterate(::Parsers.Token)
Closest candidates are:
iterate(::Union{LinRange, StepRangeLen}) at /usr/local/julia/share/julia/base/range.jl:826
iterate(::Union{LinRange, StepRangeLen}, ::Integer) at /usr/local/julia/share/julia/base/range.jl:826
iterate(::T) where T<:Union{Base.KeySet{<:Any, <:Dict}, Base.ValueIterator{<:Dict}} at /usr/local/julia/share/julia/base/dict.jl:695
...
Stacktrace:
[1] indexed_iterate(I::Parsers.Token, i::Int64)
@ Base ./tuple.jl:92
[2] checkcommentandemptyline(buf::Vector{UInt8}, pos::Int64, len::Int64, cmt::Any, ignoreemptyrows::Bool, nlines::Base.RefValue{Int64})
@ CSV ~/.julia/packages/CSV/jFiCn/src/detection.jl:276
[3] skiptorow(buf::Vector{UInt8}, pos::Int64, len::Int64, oq::Parsers.Token, eq::UInt8, cq::Parsers.Token, cmt::Any, ignoreemptyrows::Bool, cur::Int64, dest::Int64)
@ CSV ~/.julia/packages/CSV/jFiCn/src/detection.jl:191
[4] detectcolumnnames(buf::Vector{UInt8}, headerpos::Int64, datapos::Int64, len::Int64, options::Parsers.Options, header::Any, normalizenames::Bool)
@ CSV ~/.julia/packages/CSV/jFiCn/src/detection.jl:178
[5] CSV.Context(source::CSV.Arg, header::CSV.Arg, normalizenames::CSV.Arg, datarow::CSV.Arg, skipto::CSV.Arg, footerskip::CSV.Arg, transpose::CSV.Arg, comment::CSV.Arg, ignoreemptyrows::CSV.Arg, ignoreemptylines::CSV.Arg, select::CSV.Arg, drop::CSV.Arg, limit::CSV.Arg, buffer_in_memory::CSV.Arg, threaded::CSV.Arg, ntasks::CSV.Arg, tasks::CSV.Arg, rows_to_check::CSV.Arg, lines_to_check::CSV.Arg, missingstrings::CSV.Arg, missingstring::CSV.Arg, delim::CSV.Arg, ignorerepeated::CSV.Arg, quoted::CSV.Arg, quotechar::CSV.Arg, openquotechar::CSV.Arg, closequotechar::CSV.Arg, escapechar::CSV.Arg, dateformat::CSV.Arg, dateformats::CSV.Arg, decimal::CSV.Arg, truestrings::CSV.Arg, falsestrings::CSV.Arg, stripwhitespace::CSV.Arg, type::CSV.Arg, types::CSV.Arg, typemap::CSV.Arg, pool::CSV.Arg, downcast::CSV.Arg, lazystrings::CSV.Arg, stringtype::CSV.Arg, strict::CSV.Arg, silencewarnings::CSV.Arg, maxwarnings::CSV.Arg, debug::CSV.Arg, parsingdebug::CSV.Arg, validate::CSV.Arg, streaming::CSV.Arg)
@ CSV ~/.julia/packages/CSV/jFiCn/src/context.jl:392
[6] #File#25
@ ~/.julia/packages/CSV/jFiCn/src/file.jl:221 [inlined]
[7] read(source::String, sink::Type; copycols::Bool, kwargs::Base.Pairs{Symbol, Vector{Int64}, Tuple{Symbol}, NamedTuple{(:header,), Tuple{Vector{Int64}}}})
@ CSV ~/.julia/packages/CSV/jFiCn/src/CSV.jl:91
[8] top-level scope
@ In[4]:5
[9] eval
@ ./boot.jl:373 [inlined]
[10] include_string(mapexpr::typeof(REPL.softscope), mod::Module, code::String, filename::String)
@ Base ./loading.jl:1196
LoadError: MethodError: no method matching iterate(::Parsers.Token)
Closest candidates are:
iterate(::Union{LinRange, StepRangeLen}) at /usr/local/julia/share/julia/base/range.jl:826
iterate(::Union{LinRange, StepRangeLen}, ::Integer) at /usr/local/julia/share/julia/base/range.jl:826
iterate(::T) where T<:Union{Base.KeySet{<:Any, <:Dict}, Base.ValueIterator{<:Dict}} at /usr/local/julia/share/julia/base/dict.jl:695
Installing Parsers v2.3.2 fixed the failure. Pipeline failure
Also attempt to bring back some precompilation
Hi there, I have been redirected from Slack to post an issue here.
I have had some trouble loading a large CSV file with large values. E.g., a CSV file with 2 million observations, 4 columns of large Float64
values. BigFloat
is much worse.
julia> @time df = DataFrames.DataFrame(CSV.File("/Users/jakeireland/Desktop/hamming_bound_integers_2000000.csv"))
344.206327 seconds (1.97 G allocations: 38.920 GiB, 21.82% gc time)
julia> @time df = DataFrames.DataFrame(CSV.read("/Users/jakeireland/Desktop/hamming_bound_integers_2000000.csv"))
545.099823 seconds (1.97 G allocations: 38.958 GiB, 22.85% gc time)
This is much faster in base R, and faster again using readr
.
I am trying to rely on xparse
to correctly parse a value when i know the input contains invalid characters (i.e. an invalid delimiter). I am hoping/expecting to get the correct value and a INVALID_DELIMITER
return code.
(I gather we do want it to be possible to rely on xparse
in the presence of invalid delimiters, given #78).
But xparse
doesn't always return the correct value (using Parsers.jl v2.0.6).
For example, when trying to parse a Float64
when there are special characters like /
julia> using Parsers
julia> buf = codeunits("1.0 /");
julia> res = Parsers.xparse(Float64, buf, 1, length(buf), Parsers.XOPTIONS)
Parsers.Result{Float64}(-32607, 5, 2.3255508133e-314)
julia> res.val, Parsers.codes(res.code)
(2.3255508133e-314, "INVALID: OK | EOF | INVALID_DELIMITER ")
here xparse
returned the expected code (INVALID_DELIMITER
), but not the correct value (expected is res.val === 1.0
)
The internal xparse2
gives the correct value, suggesting the typeparser
actually does extract the correct value (and the "incorrect" code is due to simplifications in xparse2
and doesn't matter here)
julia> res = Parsers.xparse2(Float64, str, 1, length(str), Parsers.XOPTIONS)
Parsers.Result{Float64}(-32735, 5, 1.0)
julia> res.val, Parsers.codes(res.code)
(1.0, "INVALID: OK | EOF ")
And calling typeparser
directly, I see the correct value (as expected):
julia> b, code = buf[1], Parsers.SUCCESS;
julia> Parsers.typeparser(Float64, buf, 1, length(buf), b, code, Parsers.XOPTIONS)
(1.0, 1, 4)
This isn't specific to the /
character or to Float64
, e.g. parsing Int64
s:
julia> buf = codeunits("2 _");
julia> res = Parsers.xparse(Int64, buf, 1, length(buf), Parsers.XOPTIONS)
Parsers.Result{Int64}(-32607, 3, 4738866224)
julia> res.val, Parsers.codes(res.code)
(4738866224, "INVALID: OK | EOF | INVALID_DELIMITER ")
julia> Parsers.typeparser(Int64, buf, 1, length(buf), buf[1], code, Parsers.XOPTIONS)
(2, 1, 2)
julia> buf = codeunits("3 *");
julia> res = Parsers.xparse(Int64, buf, 1, length(buf), Parsers.XOPTIONS)
Parsers.Result{Int64}(-32607, 3, 4738866224)
julia> res.val, Parsers.codes(res.code)
(4738866224, "INVALID: OK | EOF | INVALID_DELIMITER ")
julia> Parsers.typeparser(Int64, buf, 1, length(buf), buf[1], code, Parsers.XOPTIONS)
(3, 1, 2)
So i suspect, this isn't to do with the typeparser
s, but to do with the logic for handling invalid
cases in xparse
.
In particular, i think it's because xparse
doesn't populate the value when the codes is not ok
:
typeparser
returns the correct valuexparse
correctly sets the code to INVALID_DELIMITER
and send us to donedone
Lines 532 to 540 in 6b560d4
donedone
check's if ok(code)
(which is false
) and then doesn't pass the value to Result
Lines 659 to 666 in 6b560d4
So we have everything we need... but we're not using it.
I think donedone
might be doing this to handle the cases where we get sent to donedone
before we've even called typeparser
(e.g. because we hit "end of file" before hitting non-whitespace characters)
If this diagnosis is correct, i wonder if we should just handle that explicitly, rather than checking ok(code)
e.g.
via a different goto-label, e.g.
+@label earlydone
+ # earlydone means parsing finished before calling `typeparser(T, ...)` to parse a `value::T`
+ tlen = pos - startpos
+ return Result{S}(code, tlen)
+
@label donedone
tlen = pos - startpos
- if ok(code)
- y::T = x
- return Result{S}(code, tlen, y)
- else
- return Result{S}(code, tlen)
- end
+ y::T = x
+ return Result{S}(code, tlen, y)
Specifically #130 breaks the tests for InlineStrings.jl (JuliaStrings/InlineStrings.jl#48)
On v2.4.0
julia> res = Parsers.xparse(InlineString7, "")
Parsers.Result{String7}(33, 0, "")
julia> res.val
""
On v2.4.1
julia> res = Parsers.xparse(InlineString7, "")
Parsers.Result{PosLen}(33, 0, PosLen(0x0000000000100000))
julia> res.val
PosLen(0x0000000000100000)
Looks like tryparse
just needs to be defined to accept AbstractString.... is there any reason not to?
julia> Parsers.tryparse(split("1,2", ",")[1], Float64)
ERROR: MethodError: no method matching tryparse(::SubString{String}, ::Type{Float64})
Closest candidates are:
tryparse(::String, ::Type{T}; kwargs...) where T at /Users/tomkwong/.julia/packages/Parsers/oDXb6/src/Parsers.jl:262
tryparse(::IO, ::Type{T}; kwargs...) where T at /Users/tomkwong/.julia/packages/Parsers/oDXb6/src/Parsers.jl:266
Stacktrace:
[1] top-level scope at none:0
I think we have too many different xparse
methods that set different defaults.
julia> Parsers.xparse(String, str) # == Parsers.xparse(String, str; quoted=true)
options.quoted = true
Parsers.Result{PosLen}(-32603, 15, PosLen(0x0000000000200003))
julia> Parsers.xparse(String, str, 1, sizeof(str))
options.quoted = true
Parsers.Result{PosLen}(-32603, 15, PosLen(0x0000000000200003))
julia> Parsers.xparse(String, str, 1, sizeof(str), Parsers.Options())
options.quoted = false
Parsers.Result{PosLen}(33, 15, PosLen(0x000000000010000f))
julia> Parsers.xparse(String, str, 1, sizeof(str), Parsers.Options(; quoted=true))
options.quoted = true
Parsers.Result{PosLen}(5, 6, PosLen(0x0000000000200003))
The first hits this, which passes quoted::Bool=true
:
Lines 211 to 212 in e2259a6
The second hits this, which uses Parsers.XOPTIONS
Lines 217 to 218 in e2259a6
XOPTIONS
has quoted=true
Line 164 in e2259a6
XOPTIONS
exist?)The third hits the same method as 2, but passing in Parsers.Options()
which has quoted=false
:
Line 157 in e2259a6
The fourth hits the same method as 2/3, but passes in Parsers.Options(; quoted=true)
to set that explicitly ...but returns a different answer to 1 (xparse(String, str; quoted=true)
), because Options()
defaults to delim=nothing
whereas 1 sets delim=UInt8(',')
Line 149 in e2259a6
This is now very off-topic from your original issue (sorry), so can move it to a new issue, but i think perhaps we could simplify the xparse
interface to make this whole thing a little less confusing / more explicit.
I think i'd be in favour of requiring a user-given ::Options
argument.
cc @quinnj
Originally posted by @nickrobinson251 in #119 (comment)
What is this package license?
When updating from Parsers 0.2.18 to 0.2.20 I get the following when parsing a CSV file:
realloc(): invalid pointer
signal (6): Aborted
in expression starting at no file:0
gsignal at /lib64/libc.so.6 (unknown line)
abort at /lib64/libc.so.6 (unknown line)
__libc_message at /lib64/libc.so.6 (unknown line)
malloc_printerr at /lib64/libc.so.6 (unknown line)
realloc at /lib64/libc.so.6 (unknown line)
jl_gc_counted_realloc_with_old_size at /buildworker/worker/package_linux64/build/src/gc.c:2777
__gmpz_realloc at /usr/local/julia/bin/../lib/julia/libgmp.so (unknown line)
__gmpz_mul_2exp at /usr/local/julia/bin/../lib/julia/libgmp.so (unknown line)
mul_2exp! at ./gmp.jl:146 [inlined]
mul_2exp! at ./gmp.jl:148 [inlined]
scale at /root/.julia/packages/Parsers/v5u2B/src/floats.jl:96 [inlined]
scale at /root/.julia/packages/Parsers/v5u2B/src/floats.jl:114 [inlined]
#_defaultparser#46 at /root/.julia/packages/Parsers/v5u2B/src/floats.jl:255 [inlined]
_defaultparser at /root/.julia/packages/Parsers/v5u2B/src/floats.jl:126 [inlined]
#defaultparser#45 at /root/.julia/packages/Parsers/v5u2B/src/floats.jl:123 [inlined]
defaultparser at /root/.julia/packages/Parsers/v5u2B/src/floats.jl:123 [inlined]
#parse!#13 at /root/.julia/packages/Parsers/v5u2B/src/Parsers.jl:349 [inlined]
parse! at /root/.julia/packages/Parsers/v5u2B/src/Parsers.jl:349 [inlined]
#parse!#29 at /root/.julia/packages/Parsers/v5u2B/src/Parsers.jl:644 [inlined]
parse! at /root/.julia/packages/Parsers/v5u2B/src/Parsers.jl:642 [inlined]
#parse!#28 at /root/.julia/packages/Parsers/v5u2B/src/Parsers.jl:618 [inlined]
parse! at /root/.julia/packages/Parsers/v5u2B/src/Parsers.jl:616 [inlined]
#parse!#27 at /root/.julia/packages/Parsers/v5u2B/src/Parsers.jl:572 [inlined]
parse! at /root/.julia/packages/Parsers/v5u2B/src/Parsers.jl:547 [inlined]
#parse!#26 at /root/.julia/packages/Parsers/v5u2B/src/Parsers.jl:448 [inlined]
parse! at /root/.julia/packages/Parsers/v5u2B/src/Parsers.jl:448 [inlined]
#parse#15 at /root/.julia/packages/Parsers/v5u2B/src/Parsers.jl:351 [inlined]
parse at /root/.julia/packages/Parsers/v5u2B/src/Parsers.jl:351 [inlined]
parsefield at /root/.julia/packages/CSV/eWuJV/src/tables.jl:88 [inlined]
getproperty at /root/.julia/packages/CSV/eWuJV/src/tables.jl:182
getproperty at /root/.julia/packages/CSV/eWuJV/src/tables.jl:148 [inlined]
macro expansion at /root/.julia/packages/Tables/qIlOP/src/utils.jl:55 [inlined]
eachcolumn at /root/.julia/packages/Tables/qIlOP/src/utils.jl:47 [inlined]
buildcolumns at /root/.julia/packages/Tables/qIlOP/src/fallbacks.jl:95
jl_fptr_trampoline at /buildworker/worker/package_linux64/build/src/gf.c:1831
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2184
columns at /root/.julia/packages/Tables/qIlOP/src/fallbacks.jl:149 [inlined]
Type at /root/.julia/packages/DataFrames/IKMvt/src/other/tables.jl:21
jl_fptr_trampoline at /buildworker/worker/package_linux64/build/src/gf.c:1831
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2184
|> at ./operators.jl:813
jl_fptr_trampoline at /buildworker/worker/package_linux64/build/src/gf.c:1831
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2184
#read#105 at /root/.julia/packages/CSV/eWuJV/src/CSV.jl:315
unknown function (ip: 0x7fa269275a4b)
jl_fptr_trampoline at /buildworker/worker/package_linux64/build/src/gf.c:1831
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2184
#read at ./none:0
jl_fptr_trampoline at /buildworker/worker/package_linux64/build/src/gf.c:1831
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2184
...
I'll note that this error is occurring in a fairly deep application. I can try to make a minimal reproducible test if requested.
julia> Parsers.tryparse(Int, "10.2")
10
Is there an option to make this fail (return nothing
) ?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.