Code Monkey home page Code Monkey logo

stringencodings.jl's People

Contributors

ararslan avatar davidanthoff avatar juliatagbot avatar marlin-na avatar nalimilan avatar nosferican avatar sambitdash avatar scottpjones avatar staticfloat avatar stevengj avatar tkelman avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

stringencodings.jl's Issues

Pkg.build fails on 0.7-master

I'm having trouble getting the package to work on the latest master branch. I had WinRPM troubles, those just got resolved, so WinRPM should be working correctly.

Pkg.build("StringEncodings")

# a million deprecation warnings...
# ...
# ...

INFO: Packages to install: win_iconv-dll
WARNING: Base.Void is deprecated, use Nothing instead.
  likely near C:\Users\tbeason\.julia\v0.7\StringEncodings\deps\build.jl:976
WARNING: Base.Void is deprecated, use Nothing instead.
  likely near C:\Users\tbeason\.julia\v0.7\StringEncodings\deps\build.jl:976
WARNING: Base.Void is deprecated, use Nothing instead.
  likely near C:\Users\tbeason\.julia\v0.7\StringEncodings\deps\build.jl:976
WARNING: Base.Void is deprecated, use Nothing instead.
  likely near C:\Users\tbeason\.julia\v0.7\StringEncodings\deps\build.jl:976
┌ Warning: `info()` is deprecated, use `@info` instead.
│   caller = do_install(::WinRPM.Package) at WinRPM.jl:454
└ @ WinRPM WinRPM.jl:454
INFO: Downloading: win_iconv-dll
┌ Warning: `info()` is deprecated, use `@info` instead.
│   caller = do_install(::WinRPM.Package) at WinRPM.jl:465
└ @ WinRPM WinRPM.jl:465
INFO: Extracting: win_iconv-dll
┌ Warning: `open(cmd)` now returns only a Process<:IO object.
│   caller = next at deprecated.jl:200 [inlined]
└ @ Core deprecated.jl:200

ERROR: The system cannot find the file specified.
C:\Users\tbeason\.julia\v0.7\WinRPM\cache\2\noarch%2Fmingw64-win_iconv-dll-0.0.8-3.15.noarch.cpio



System ERROR:
The system cannot find the file specified.

7-Zip [64] 16.04 : Copyright (c) 1999-2016 Igor Pavlov : 2016-10-04

Scanning the drive for archives:

┌ Error: ------------------------------------------------------------# Build failed for StringEncodings
│   exception =
│    LoadError: failed process: Process(`'C:\Users\tbeason\AppData\Local\Julia-0.7.0-DEV\bin\7z.exe' x -y 'C:\Users\tbeason\.julia\v0.7\WinRPM\cache\2\noarch%2Fmingw64-win_iconv-dll-0.0.8-3.15.noarch.cpio' '-oC:\Users\tbeason\.julia\v0.7\WinRPM\deps'`, ProcessExited(2)) [2]
│    Stacktrace:
│     [1] error(::String, ::Base.Process, ::String, ::Int64, ::String) at .\error.jl:42
│     [2] pipeline_error(::Base.Process) at .\process.jl:698
│     [3] do_install(::WinRPM.Package) at C:\Users\tbeason\.julia\v0.7\WinRPM\src\WinRPM.jl:483
│     [4] do_install at C:\Users\tbeason\.julia\v0.7\WinRPM\src\WinRPM.jl:445 [inlined]
│     [5] #install#21(::Bool, ::Function, ::WinRPM.Package) at C:\Users\tbeason\.julia\v0.7\WinRPM\src\WinRPM.jl:392
│     [6] #install at .\<missing>:0 [inlined]
│     [7] #install#19 at C:\Users\tbeason\.julia\v0.7\WinRPM\src\WinRPM.jl:361 [inlined]
│     [8] #install at .\<missing>:0 [inlined] (repeats 2 times)
│     [9] (::getfield(WinRPM, Symbol("##36#37")){WinRPM.RPM})() at C:\Users\tbeason\.julia\v0.7\WinRPM\src\winrpm_bindeps.jl:42
│     [10] run(::getfield(WinRPM, Symbol("##36#37")){WinRPM.RPM}) at C:\Users\tbeason\.julia\v0.7\BinDeps\src\BinDeps.jl:484
│     [11] run(::BinDeps.SynchronousStepCollection) at C:\Users\tbeason\.julia\v0.7\BinDeps\src\BinDeps.jl:527
│     [12] satisfy!(::BinDeps.LibraryDependency, ::Array{DataType,1}) at C:\Users\tbeason\.julia\v0.7\BinDeps\src\dependencies.jl:943
│     [13] satisfy!(::BinDeps.LibraryDependency) at C:\Users\tbeason\.julia\v0.7\BinDeps\src\dependencies.jl:921
│     [14] top-level scope at C:\Users\tbeason\.julia\v0.7\BinDeps\src\dependencies.jl:976
│     [15] include(::Module, ::String) at .\boot.jl:292
│     [16] include_relative(::Module, ::String) at .\loading.jl:1012
│     [17] include at .\sysimg.jl:26 [inlined]
│     [18] include(::String) at .\loading.jl:1046
│     [19] top-level scope
│     [20] eval at .\boot.jl:295 [inlined]
│     [21] eval at .\sysimg.jl:71 [inlined]
│     [22] evalfile(::String, ::Array{String,1}) at .\loading.jl:1041 (repeats 2 times)
│     [23] #2 at .\none:15 [inlined]
│     [24] cd(::getfield(, Symbol("##2#5")){String}, ::String) at .\file.jl:59
│     [25] (::getfield(, Symbol("##1#3")))(::IOStream) at .\none:14
│     [26] #open#318(::Base.Iterators.IndexValue{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::Function, ::getfield(, Symbol("##1#3")), ::String, ::Vararg{String,N} where N) at .\iostream.jl:369
│     [27] open(::Function, ::String, ::String) at .\iostream.jl:367
│     [28] top-level scope
│     [29] eval at .\boot.jl:295 [inlined]
│     [30] eval(::Module, ::Expr) at .\sysimg.jl:71
│     [31] exec_options(::Base.JLOptions) at .\client.jl:309
│     [32] _start() at .\client.jl:447in expression starting at C:\Users\tbeason\.julia\v0.7\StringEncodings\deps\build.jl:976
└ @ Main none:18
┌ Warning: ------------------------------------------------------------# Build error summary
│
│ StringEncodings had build errors.
│
│  - packages with build errors remain installed in C:\Users\tbeason\.julia\v0.7- build the package(s) and all dependencies with `Pkg.build("StringEncodings")`- build a single package by running its `deps/build.jl` script
└ @ Pkg.Entry entry.jl:651

Looking at the directory it is searching:

λ ls -1 C:\Users\tbeason\.julia\v0.7\WinRPM\cache\2
mingw64-win_iconv-dll-0.0.8-3.15.noarch.cpio
noarch%2Fmingw64-win_iconv-dll-0.0.8-3.15.noarch.rpm
repodata%2F58a3da7b6a7a7cf1c71a355252e0d9db1aab60162e9017b2235f5fa7a118660f-primary.xml
repodata%2Frepomd.xml

Inaccurate comment of flush

The comment says that flush returns the number of bytes written to output buffer, but it returns the encoder. Is the comment outdated?

# Flush input buffer and convert it into output buffer
# Returns the number of bytes written to output buffer
function flush(s::StringEncoder)
s.cd == C_NULL && return s
# We need to retry several times in case output buffer is too small to convert
# all of the input. Even so, some incomplete sequences may remain in the input
# until more data is written, which will only trigger an error on close().
s.outbytesleft[] = 0
while s.outbytesleft[] < BUFSIZE
iconv!(s.cd, s.inbuf, s.outbuf, s.inbufptr, s.outbufptr, s.inbytesleft, s.outbytesleft)
write(s.stream, view(s.outbuf, 1:(BUFSIZE - Int(s.outbytesleft[]))))
end
s
end

Make decode() interface accept AbstractVector

Hi,

we are working a lot with @view/SubArray on top of a larger buffer. Currently the decode() function requires a Vector{UInt8} argument. For us it means we need to copy the data in an inner, performance optimized, loop. In case there is no specific reason, would it possible to change the interface to support AbstractVector{UInt8}?

thanks a lot

BinDeps.shlib_ext is deprecated

Warning in Julia v0.6.1:

julia> Pkg.build("StringEncodings")
INFO: Building StringEncodings
WARNING: BinDeps.shlib_ext is deprecated.
  likely near /home/ec2-user/.julia/v0.6/StringEncodings/deps/build.jl:47

julia> versioninfo()
Julia Version 0.6.1
Commit 0d7248e2ff (2017-10-24 22:15 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Xeon(R) CPU E5-2676 v3 @ 2.40GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.9.1 (ORCJIT, haswell)

Capitalize package name

If you plan on registering this. Otherwise will sort at the very end. Only other non capitalized package name is kNN, and that is deprecated and doesn't have any tags.

Add tests with strings longer than BUFSIZE

At #40 a crash wasn't caught by tests, probably because none of them covers strings longer than the buffer. It's essential to also test that. It's possible that BUFSIZE was increased after the test were written.

iconv not installed properly (Windows 10)

julia> using StringEncodings
[ Info: Precompiling StringEncodings [69024149-9ee7-55f6-a4c4-859efe599b68]
ERROR: LoadError: iconv not installed properly, run Pkg.build("StringEncodings"), restart Julia and try again
Stacktrace:
 [1] error(::String) at .\error.jl:33
 [2] top-level scope at C:\Users\JohnDoe\.julia\packages\StringEncodings\B9gIH\src\StringEncodings.jl:10
 [3] include(::Function, ::Module, ::String) at .\Base.jl:380
 [4] include(::Module, ::String) at .\Base.jl:368
 [5] top-level scope at none:2
 [6] eval at .\boot.jl:331 [inlined]
 [7] eval(::Expr) at .\client.jl:467
 [8] top-level scope at .\none:3
in expression starting at C:\Users\JohnDoe\.julia\packages\StringEncodings\B9gIH\src\StringEncodings.jl:9
ERROR: Failed to precompile StringEncodings [69024149-9ee7-55f6-a4c4-859efe599b68] to C:\Users\JohnDoe\.julia\compiled\v1.5\StringEncodings\ACjY3_cqGsg.ji.
Stacktrace:
 [1] error(::String) at .\error.jl:33
 [2] compilecache(::Base.PkgId, ::String) at .\loading.jl:1290
 [3] _require(::Base.PkgId) at .\loading.jl:1030
 [4] require(::Base.PkgId) at .\loading.jl:928
 [5] require(::Module, ::Symbol) at .\loading.jl:923

Pkg.build("StringEncodings") didn't help. Julia versions prior to Julia 1.5.0 on this machine did successfully install and use StringEncodings. Removing and reinstalling StringEncodings didn't work either.

julia> versioninfo()
Julia Version 1.5.0
Commit 96786e22cc (2020-08-01 23:44 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-9.0.1 (ORCJIT, skylake)
Environment:
  JULIA_NUM_THREADS = 7

Detect non-convertible characters

I have to convert text files from an UTF-encoding to a ISO-8859-1-encoding. In this case not all characters can be converted to the target encoding. So, whenever encode encounters a non-convertible character within a string, it raises an exception.

Having an exception raised in this situation is in my use case not really a good solution.

Is there a way to sort of "test" a string in advance, if it contains non-convertible characters and also identify which of the characters within the string are non-convertible?

Conversion performance

I've written some simple functions that create tables using iconv.jl, and then do the conversions in pure Julia code instead of calling iconv, as well as comparing the performance of

  1. converting from an 8-bit character set to UTF-8 via iconv.jl
  2. " " " " to UTF-16 via iconv.jl
  3. " " " " to UTF-16 via https://github.com/nolta/ICU.jl
  4. " " " " to UTF-8 via my convert code
  5. " " " " to UTF-16 via my convert code

I've made a Gist with benchmark results (using https://github.com/johnmyleswhite/Benchmarks.jl)
along with the code and benchmarking code, at:
https://gist.github.com/ScottPJones/fcd12f675edb3d79b5ce.
The tables created are also very small, at most couple hundred bytes (or less) per character set
(maximum, if the character set is ASCII compatible, is 256 bytes, if it an ANSI character set, max is 192 bytes, and only 64 bytes for CP1252 - which woud probably be the most used conversion).

Should we move towards using this approach at least for the 8-bit character set conversions?
It would also make it easy to add all of the options that Python 3 has, for handling invalid characters
(error, remove, replace with fixed replacement character (default 0xfffd) or string, insert quoted XML escape sequence, insert quoted as \uxxxx or \u{xxxx}.

Add optimized read! and and write methods

#38 implements an optimized readbytes! method. The same approach could be used to make read! and write more efficient with arrays, probably by overloading Base.unsafe_read and Base.unsafe_write.

no method matching skip

Hello,
I want to search in some text for a keyword and extract the Text beginning with the keyword to the end.

I have the following simplified example:

(path, io) = mktemp();
write(path, "Hello World!") # dummy content
readuntil(io, "World") # serach for keyword
skip(io, -5) # go back to include keyword in result
DesiredOutput = readuntil(io, "\n") # get rest of the file

However, this is not possible when working with an encoding:

io2 = open(path, enc"MS-ANSI") # open with encoding
readuntil(io2 , "World") # serach for keyword
skip(io2 , -5) # go back to include keyword in result <-- errors
DesiredOutput = readuntil(io2 , "\n") # get rest of the file

Should skip be supproted or is there a better solution for this?

MethodError: no method matching position(::StringDecoder{...}) with readuntil

julia> readuntil(IOBuffer("noël"), enc"UTF-8", "ë")
"no"

julia> readuntil(IOBuffer("noël"), enc"UTF-8", 'ë')
ERROR: MethodError: no method matching position(::StringDecoder{Encoding{Symbol("UTF-8")},Encoding{Symbol("UTF-8")},Base.GenericIOBuffer{Array{UInt8,1}}})
Closest candidates are:
  position(::Base.Filesystem.File) at filesystem.jl:225
  position(::Base.Libc.FILE) at libc.jl:92
  position(::IOStream) at iostream.jl:188
  ...
Stacktrace:
 [1] mark(::StringDecoder{Encoding{Symbol("UTF-8")},Encoding{Symbol("UTF-8")},Base.GenericIOBuffer{Array{UInt8,1}}}) at ./io.jl:915
 [2] peek at ./iostream.jl:525 [inlined]
 [3] read(::StringDecoder{Encoding{Symbol("UTF-8")},Encoding{Symbol("UTF-8")},Base.GenericIOBuffer{Array{UInt8,1}}}, ::Type{Char}) at ./io.jl:625
 [4] #readuntil#283(::Bool, ::Function, ::StringDecoder{Encoding{Symbol("UTF-8")},Encoding{Symbol("UTF-8")},Base.GenericIOBuffer{Array{UInt8,1}}}, ::Char) at ./io.jl:646
 [5] #readuntil#11 at ./none:0 [inlined]
 [6] readuntil(::Base.GenericIOBuffer{Array{UInt8,1}}, ::Encoding{Symbol("UTF-8")}, ::Char) at /home/milan/.julia/StringEncodings/src/StringEncodings.jl:443
 [7] top-level scope at none:0

Use BinaryBuilder

Since libiconv is already available in BB, it should be straightforward to use it.

Empty output from encode()

julia> encode("", "shift_jisx0213")
UInt8[]

This should have provided the same result as "sjis" encoding.

julia> encode("", "sjis")
2-element Vector{UInt8}:
 0x83
 0x4e

unsafe_wrap cause dead kernel on 0.3.2?

In 0.3.2 , unsafe_wrap used on julia version 1.4.1

@static if VERSION >= v"1.6.0-DEV.438"
        inbuf_view = view(s.inbuf, Int(s.inbytesleft[]+1):BUFSIZE)
    else
        inbuf_view = unsafe_wrap(Array, pointer(s.inbuf, s.inbytesleft[]+1), BUFSIZE)
    end

however, Bounderror or dead kernel occurred while decoding files , but encoding/decoding string is fine
I wonder "unsafe_wrap" maybe the cause

StringEncodings.jl uses libiconv-1.14 which is broken on Ubuntu 17.04 64-bit

~/.julia/v0.6/StringEncodings/deps/src/libiconv-1.14$ uname -a Linux 4.10.0-30-generic #34-Ubuntu SMP Mon Jul 31 19:38:17 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
Independently, building libiconv-1.14 also reports the exact same error.

make[3]: Leaving directory '/home/sambit/.julia/v0.6/StringEncodings/deps/src/libiconv-1.14' gcc -DHAVE_CONFIG_H -DEXEEXT=\"\" -I. -I.. -I../lib -I../intl -DDEPENDS_ON_LIBICONV=1 -DDEPENDS_ON_LIBINTL=1 -g -O2 -c progname.c In file included from progname.c:26:0: ./stdio.h:1010:1: error: ‘gets’ undeclared here (not in a function) _GL_WARN_ON_USE (gets, "gets is a security hole - use fgets instead"); ^ Makefile:914: recipe for target 'progname.o' failed make[2]: *** [progname.o] Error 1

libiconv-1.15 does not have these errors and builds properly.

TagBot trigger issue

This issue is used to trigger TagBot; feel free to unsubscribe.

If you haven't already, you should update your TagBot.yml to include issue comment triggers.
Please see this post on Discourse for instructions and more details.

If you'd like for me to do this for you, comment TagBot fix on this issue.
I'll open a PR within a few hours, please be patient!

New release needed post 0.2.2.

The 0.2.2 is downloading libiconv-1.15. However, the fix in the build.jl needs to hit the metadata so that the iconv is picked up from the libc.

Could not extract the platform key

On installation it gives the following warning. Not harmful but I just want to report the issue anyways.

┌ Warning: Could not extract the platform key of https://github.com/JuliaStrings/IConvBuilder/releases/download/v1.15+build.3/IConv.x86_64-linux-gnu.tar.gz; continuing...
└ @ BinaryProvider ~/.julia/packages/BinaryProvider/UTYxu/src/Prefix.jl:224

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.