ruby / zlib Goto Github PK
View Code? Open in Web Editor NEWRuby interface for the zlib compression/decompression library
License: BSD 2-Clause "Simplified" License
Ruby interface for the zlib compression/decompression library
License: BSD 2-Clause "Simplified" License
JRuby has an impl of zlib based on a Java port of libzlib called "jzlib". We need to incorporate our version into this gem in some way (in descending order of tight integration):
We will put together a PR soon.
I'm using the data.tar.gz from ruby-progressbar-1.13.0.gem, which seems to trigger this error. If I take exactly the same data.tar.gz, gunzip, and re-gzip it, then this bug doesn't happen. However, the gem unpacks fine, it's just readpartial
that breaks.
$ gem fetch ruby-progressbar -v 1.13.0
$ tar zxf ruby-progressbar-1.13.0.gem data.tar.gz
irb> io = File.open('data.tar.gz')
=> #<File:data.tar.gz>
irb> io.size
=> 10250
irb> Zlib::GzipReader.wrap(io) { |gzio| gzio.readpartial(16_384) until gzio.eof? } # rubygems does this, but with #read
Traceback (most recent call last):
7: from /usr/bin/irb:23:in `<main>'
6: from /usr/bin/irb:23:in `load'
5: from /Library/Ruby/Gems/2.6.0/gems/irb-1.0.0/exe/irb:11:in `<top (required)>'
4: from (irb):9
3: from (irb):9:in `wrap'
2: from (irb):9:in `block in irb_binding'
1: from (irb):9:in `readpartial'
EOFError (end of file reached)
I have done some digging to verify I didn't cause this. It seems to require that you use this data.tar.gz that has a conflict with how zlib uses readpartial. Other data.tar.gz files work, and this same data.tar file re-gzipped works.
I think the EOF is raised here because it reaches the end of the file during gzfile_read_more(gz, outbuf);
and then checks if it's at the end before finalizing the unzip. https://github.com/ruby/zlib/blob/master/ext/zlib/zlib.c#L2933
This does seem like it could be fixed. Using gzio.read(16_384)
instead of gzio.readpartial
works.
Edit2: The error is real but my understanding of it and why it happened was wrong. Leaving this open with changed title until the fix is merged.
This bug blocks rubygems from using readpartial to read gems.
$ gem install zlib
Temporarily enhancing PATH for MSYS/MINGW...
Building native extensions. This could take a while...
ERROR: Error installing zlib:
ERROR: Failed to build gem native extension.
current directory: C:/Program Files/Ruby31-x64/lib/ruby/gems/3.1.0/gems/zlib-2.1.1/ext/zlib
C:/Program\ Files/Ruby31-x64/bin/ruby.exe -I C:/Program\ Files/Ruby31-x64/lib/ruby/3.1.0 -r ./siteconf20220412-23292-w9lkku.rb extconf.rb
checking for deflateReset() in -lz... *** extconf.rb failed ***
Could not create Makefile due to some reason, probably lack of necessary
libraries and/or headers. Check the mkmf.log file for more details. You may
need configuration options.
Provided configuration options:
--with-opt-dir
--without-opt-dir
--with-opt-include
--without-opt-include=${opt-dir}/include
--with-opt-lib
--without-opt-lib=${opt-dir}/lib
--with-make-prog
--without-make-prog
--srcdir=.
--curdir
--ruby=C:/Program Files/Ruby31-x64/bin/$(RUBY_BASE_NAME)
--with-zlib-dir
--without-zlib-dir
--with-zlib-include
--without-zlib-include=${zlib-dir}/include
--with-zlib-lib
--without-zlib-lib=${zlib-dir}/lib
--with-z-dir
--without-z-dir
--with-z-include
--without-z-include=${z-dir}/include
--with-z-lib
--without-z-lib=${z-dir}/lib
--with-zlib
--without-zlib
C:/Program Files/Ruby31-x64/lib/ruby/3.1.0/mkmf.rb:498:in `try_do': The compiler failed to generate an executable file. (RuntimeError)
You have to install development tools first.
from C:/Program Files/Ruby31-x64/lib/ruby/3.1.0/mkmf.rb:591:in `try_link0'
from C:/Program Files/Ruby31-x64/lib/ruby/3.1.0/mkmf.rb:609:in `try_link'
from C:/Program Files/Ruby31-x64/lib/ruby/3.1.0/mkmf.rb:830:in `try_func'
from C:/Program Files/Ruby31-x64/lib/ruby/3.1.0/mkmf.rb:1065:in `block in have_library'
from C:/Program Files/Ruby31-x64/lib/ruby/3.1.0/mkmf.rb:1007:in `block in checking_for'
from C:/Program Files/Ruby31-x64/lib/ruby/3.1.0/mkmf.rb:362:in `block (2 levels) in postpone'
from C:/Program Files/Ruby31-x64/lib/ruby/3.1.0/mkmf.rb:332:in `open'
from C:/Program Files/Ruby31-x64/lib/ruby/3.1.0/mkmf.rb:362:in `block in postpone'
from C:/Program Files/Ruby31-x64/lib/ruby/3.1.0/mkmf.rb:332:in `open'
from C:/Program Files/Ruby31-x64/lib/ruby/3.1.0/mkmf.rb:358:in `postpone'
from C:/Program Files/Ruby31-x64/lib/ruby/3.1.0/mkmf.rb:1006:in `checking_for'
from C:/Program Files/Ruby31-x64/lib/ruby/3.1.0/mkmf.rb:1060:in `have_library'
from extconf.rb:14:in `block in <main>'
from extconf.rb:14:in `each'
from extconf.rb:14:in `find'
from extconf.rb:14:in `<main>'
To see why this extension failed to compile, please check the mkmf.log which can be found here:
C:/Program Files/Ruby31-x64/lib/ruby/gems/3.1.0/extensions/x64-mingw-ucrt/3.1.0/zlib-2.1.1/mkmf.log
extconf failed, exit code 1
Gem files will remain installed in C:/Program Files/Ruby31-x64/lib/ruby/gems/3.1.0/gems/zlib-2.1.1 for inspection.
Results logged to C:/Program Files/Ruby31-x64/lib/ruby/gems/3.1.0/extensions/x64-mingw-ucrt/3.1.0/zlib-2.1.1/gem_make.out
The log files mentioned in the output are attached.
I am seeing the following test failures on RubyCI Ubuntu s390x version Jammy (22.04.3 LTS) on the ruby/zlib latest master branch 85637fa with the latest ruby/ruby master branch. Could you take a look at the failures? I think you can log in to the RubyCI Ubuntu s390x server to debug. Thank you for your help.
$ cat /etc/os-release | grep VERSION
VERSION_ID="22.04"
VERSION="22.04.3 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
$ uname -m
s390x
$ gcc --version
gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
$ which ruby
/home/jaruga/.local/ruby-05a853c2f2-debug/bin/ruby
$ ruby -v
ruby 3.3.0dev (2023-09-11T15:25:06Z master 05a853c2f2) [s390x-linux]
$ bundle install --standalone
$ bundle list
Gems included by the bundle:
* power_assert (2.0.3)
* rake (13.0.6)
* rake-compiler (1.2.5)
* test-unit (3.6.1)
* test-unit-ruby-core (1.0.2)
* zlib (3.0.0)
Use `bundle info` to print more detailed information about a gem
$ bundle exec rake compile
$ ldd ./lib/zlib.so
linux-vdso64.so.1 (0x000003ffa007e000)
libruby.so.3.3 => /home/jaruga/.local/ruby-05a853c2f2-debug/lib/libruby.so.3.3 (0x000003ff9f900000)
libz.so.1 => /lib/s390x-linux-gnu/libz.so.1 (0x000003ff9f800000)
libm.so.6 => /lib/s390x-linux-gnu/libm.so.6 (0x000003ff9f700000)
libc.so.6 => /lib/s390x-linux-gnu/libc.so.6 (0x000003ff9f500000)
libgmp.so.10 => /lib/s390x-linux-gnu/libgmp.so.10 (0x000003ff9f400000)
libcrypt.so.1 => /lib/s390x-linux-gnu/libcrypt.so.1 (0x000003ff9f380000)
/lib/ld64.so.1 (0x000003ffa0000000)
Below is the used Zlib deb package version.
$ dpkg -S /lib/s390x-linux-gnu/libz.so.1
zlib1g:s390x: /lib/s390x-linux-gnu/libz.so.1
$ dpkg -s zlib1g | grep ^Version
Version: 1:1.2.11.dfsg-2ubuntu9.2
$ bundle exec rake test
Loaded suite /home/jaruga/git/ruby/zlib/bundle/ruby/3.3.0+0/gems/rake-13.0.6/lib/rake/rake_test_loader
Started
F
======================================================================================================
Failure: test_deflate_stream(TestZlib)
/home/jaruga/git/ruby/zlib/test/zlib/test_zlib.rb:1411:in `test_deflate_stream'
1408: deflated << chunk
1409: end
1410:
=> 1411: assert_equal 20016, deflated.length
1412: end
1413:
1414: def test_gzip
<20016> expected but was
<21085>
diff:
? 2 0016
? 1 85
? + ???
======================================================================================================
P
======================================================================================================
Pending: test_gunzip_no_memory_leak(TestZlib): pended.
/home/jaruga/git/ruby/zlib/bundle/ruby/3.3.0+0/gems/test-unit-ruby-core-1.0.2/lib/core_assertions.rb:192:in `rescue in assert_no_memory_leak'
/home/jaruga/git/ruby/zlib/bundle/ruby/3.3.0+0/gems/test-unit-ruby-core-1.0.2/lib/core_assertions.rb:150:in `assert_no_memory_leak'
/home/jaruga/git/ruby/zlib/test/zlib/test_zlib.rb:1468:in `test_gunzip_no_memory_leak'
1465: end
1466:
1467: def test_gunzip_no_memory_leak
=> 1468: assert_no_memory_leak(%[-rzlib], "#{<<~"{#"}", "#{<<~'};'}")
1469: d = Zlib.gzip("data")
1470: {#
1471: 10_000.times {Zlib.gunzip(d)}
======================================================================================================
F
======================================================================================================
Failure: test_gzip(TestZlib)
/home/jaruga/git/ruby/zlib/test/zlib/test_zlib.rb:1419:in `test_gzip'
1416: actual[4, 4] = "\x00\x00\x00\x00" # replace mtime
1417: actual[9] = "\xff" # replace OS
1418: expected = %w[1f8b08000000000000ff4bcbcf07002165738c03000000].pack("H*")
=> 1419: assert_equal expected, actual
1420:
1421: actual = Zlib.gzip("foo".freeze, level: 0)
1422: actual[4, 4] = "\x00\x00\x00\x00" # replace mtime
<"\x1F\x8B\b\x00\x00\x00\x00\x00\x00\xFFK\xCB\xCF\a\x00!es\x8C\x03\x00\x00\x00"> expected but was
<"\x1F\x8B\b\x00\x00\x00\x00\x00\x00\xFFJ\xCB\xCF\a\f\x00!es\x8C\x03\x00\x00\x00">
diff:
? �K�� !es�
? J
? ? +
======================================================================================================
F
======================================================================================================
Failure: test_deflate_chunked(TestZlibDeflate)
/home/jaruga/git/ruby/zlib/test/zlib/test_zlib.rb:66:in `test_deflate_chunked'
63:
64: final = z.finish
65:
=> 66: assert_equal 7253, final.length
67:
68: chunks << final
69: all = chunks.join
<7253> expected but was
<8325>
diff:
? 7 253
? 83
? ? -
======================================================================================================
F
======================================================================================================
Failure: test_deflate_chunked_break(TestZlibDeflate)
/home/jaruga/git/ruby/zlib/test/zlib/test_zlib.rb:92:in `test_deflate_chunked_break'
89:
90: final = z.finish
91:
=> 92: assert_equal 3632, final.length
93:
94: all = chunks.join
95: all << final
<3632> expected but was
<4702>
diff:
? 3632
? 470
? ???
======================================================================================================
F
======================================================================================================
Failure: test_unused2(TestZlibGzipReader)
/home/jaruga/git/ruby/zlib/test/zlib/test_zlib.rb:968:in `test_unused2'
965: io = Zlib::GzipReader.new zio
966: assert_equal('aaaa', io.read)
967: unused = io.unused
=> 968: assert_equal(24, unused.bytesize)
969: io.finish
970:
971: zio.pos -= unused.length
<24> expected but was
<23>
diff:
? 24
? 3
? ?
======================================================================================================
|
Finished in 6.146844438 seconds.
------------------------------------------------------------------------------------------------------
95 tests, 511 assertions, 5 failures, 0 errors, 1 pendings, 0 omissions, 0 notifications
93.6842% passed
------------------------------------------------------------------------------------------------------
15.46 tests/s, 83.13 assertions/s
rake aborted!
Command failed with status (1)
/home/jaruga/git/ruby/zlib/bundle/ruby/3.3.0+0/gems/rake-13.0.6/exe/rake:27:in `<top (required)>'
/home/jaruga/.local/ruby-05a853c2f2-debug/bin/bundle:25:in `load'
/home/jaruga/.local/ruby-05a853c2f2-debug/bin/bundle:25:in `<main>'
Tasks: TOP => test
(See full trace by running task with --trace)
I also can see the failures in ruby/ruby latest master branch 05a853c2f21f60f9e1c544c2d0709f10de453571
.
$ make V=1 test-all TESTS="test/zlib/test_zlib.rb -v"
...
1) Failure:
TestZlib#test_deflate_stream [/home/jaruga/git/ruby/ruby/test/zlib/test_zlib.rb:1411]:
<20016> expected but was
<21085>.
2) Failure:
TestZlib#test_gzip [/home/jaruga/git/ruby/ruby/test/zlib/test_zlib.rb:1419]:
<"\x1F\x8B\b\x00\x00\x00\x00\x00\x00\xFFK\xCB\xCF\a\x00!es\x8C\x03\x00\x00\x00"> expected but was
<"\x1F\x8B\b\x00\x00\x00\x00\x00\x00\xFFJ\xCB\xCF\a\f\x00!es\x8C\x03\x00\x00\x00">.
3) Failure:
TestZlibDeflate#test_deflate_chunked [/home/jaruga/git/ruby/ruby/test/zlib/test_zlib.rb:66]:
<7253> expected but was
<8325>.
4) Failure:
TestZlibDeflate#test_deflate_chunked_break [/home/jaruga/git/ruby/ruby/test/zlib/test_zlib.rb:92]:
<3632> expected but was
<4702>.
5) Failure:
TestZlibGzipReader#test_unused2 [/home/jaruga/git/ruby/ruby/test/zlib/test_zlib.rb:968]:
<24> expected but was
<23>.
Finished tests in 2.985270s, 31.4879 tests/s, 236.1595 assertions/s.
94 tests, 705 assertions, 5 failures, 0 errors, 0 skips
ruby -v: ruby 3.3.0dev (2023-09-11T15:25:06Z master 05a853c2f2) [s390x-linux]
make: *** [uncommon.mk:914: yes-test-all] Error 5
...
Hi.
I have to inflate a .csv.gz
file which should return a 4 GB CSV with 25 million rows.
When I use an app or the gzip
command line, I get the full file without issue.
When I use Zlib::GzipReader
, only the first row is returned.
> Zlib::GzipReader.open("adresses-france.csv.gz") { |gz| print gz.read }
id;id_fantoir;numero;rep;nom_voie;code_postal;code_insee;nom_commune;code_insee_ancienne_commune;nom_ancienne_commune;x;y;lon;lat;type_position;alias;nom_ld;libelle_acheminement;nom_afnor;source_position;source_nom_voie;certification_commune;cad_parcelles
=> nil
The file is provided by the french government:
There are many other files in the directory (for each region) but I cannot reproduce the issue with other files.
This service also provided a similar file in Addok format (https://adresse.data.gouv.fr/data/ban/adresses/latest/addok/adresses-addok-france.ndjson.gz) which should return a 3GB file with 2 million rows, but only the 25k first rows are returned by Zlib::GzipReader
.
Is there any limit to what Zlib can support ? (size, rows, ..)
Does it come from the compressed file ?
We were hit by this bug that got recently fixed by #22. Would it be possible to make a new release including it?
I pushed this gem for our case as a temporary measure, but it would be great to be able to use the official one.
By the way, thanks so much to @wanabe for fixing this one 🙏. The possibility of zlib
silently writing corrupted data is a serious one and, depending on the scenario, can cause data-loss.
Would it be difficult to accept IO instances of Zlib.crc32
?
Currently we have to read a file as a whole and pass the (potentially big) string to Zlib.crc32
When running the tests for this gem against ruby 3.2.2 I get the following error:
Loaded suite /usr/lib/ruby/gems/3.2.0/gems/rake-13.0.6/lib/rake/rake_test_loader
Started
........E
===============================================================================
Error: test_gunzip_encoding(TestZlib): Zlib::DataError: invalid stored block lengths
/build/ruby-zlib/src/zlib-3.1.0/test/zlib/test_zlib.rb:1466:in `gunzip'
/build/ruby-zlib/src/zlib-3.1.0/test/zlib/test_zlib.rb:1466:in `test_gunzip_encoding'
1463: def test_gunzip_encoding
1464: # vvvvvvvv = mtime, but valid UTF-8 string of U+0080
1465: src = %w[1f8b0800c28000000003cb48cdc9c9070086a6103605000000].pack("H*").force_encoding('UTF-8')
=> 1466: assert_equal 'hello', Zlib.gunzip(src.freeze)
1467: end
1468:
1469: def test_gunzip_no_memory_leak
===============================================================================
P
===============================================================================
Pending: test_gunzip_no_memory_leak(TestZlib): pended.
/usr/lib/ruby/gems/3.2.0/gems/test-unit-ruby-core-1.0.5/lib/core_assertions.rb:192:in `rescue in assert_no_memory_leak'
/usr/lib/ruby/gems/3.2.0/gems/test-unit-ruby-core-1.0.5/lib/core_assertions.rb:150:in `assert_no_memory_leak'
/build/ruby-zlib/src/zlib-3.1.0/test/zlib/test_zlib.rb:1470:in `test_gunzip_no_memory_leak'
1467: end
1468:
1469: def test_gunzip_no_memory_leak
=> 1470: assert_no_memory_leak(%[-rzlib], "#{<<~"{#"}", "#{<<~'};'}")
1471: d = Zlib.gzip("data")
1472: {#
1473: 10_000.times {Zlib.gunzip(d)}
===============================================================================
...............................................................................
......
Finished in 0.815445996 seconds.
-------------------------------------------------------------------------------
95 tests, 517 assertions, 0 failures, 1 errors, 1 pendings, 0 omissions, 0 notifications
97.8947% passed
-------------------------------------------------------------------------------
116.50 tests/s, 634.01 assertions/s
rake aborted!
Command failed with status (1)
/build/ruby-zlib/src/zlib-3.1.0/Rakefile:10:in `block in <top (required)>'
Tasks: TOP => test_internal
(See full trace by running task with --trace)
Am I missing some dependency?
Best regards
While it is great to have streaming support in inflating processing, the max RSS could be a little high, depends on GC frequency.
gunzip
, the max RSS is only 872K.Question: can we further reduce the max RSS of streaming inflate? It seems we are using RB_ALLOC
. While GC may not timely free the memory. Can we RB_ALLOC
a small buffer, and reuse the same buffer without RB_ALLOC
new buffer again?
Test steps:
$ dd if=/dev/zero of=/dev/stdout bs=1m count=1024 | gzip -c > 1G.gz
$ cat test.rb
require 'zlib'
f = File.open '1G.gz'
z = Zlib::Inflate.new Zlib::MAX_WBITS + 32
step = 0
total = 0
until f.eof? do
gzipped = f.read(4096)
z.inflate gzipped do |chunk|
total += chunk.size
step += 1
# GC.start if step % 100 == 0 # commont out this line to see larger `max rss`
end
end
puts "total: #{total}"
z.finish
# end of test.rb
The max RSS is around 150M
$ /usr/bin/time -l ruby test.rb
total: 1073741824
2.26 real 2.12 user 0.13 sys
156069888 maximum resident set size
0 average shared memory size
0 average unshared data size
0 average unshared stack size
81828 page reclaims
0 page faults
0 swaps
0 block input operations
0 block output operations
0 messages sent
0 messages received
0 signals received
0 voluntary context switches
930 involuntary context switches
Uncomment the line GC.start if step % 100 == 0
, with more GC invoked, the max RSS is reduced to around 26M.
$ /usr/bin/time -l ruby test.rb
total: 1073741824
2.74 real 2.70 user 0.03 sys
26480640 maximum resident set size
0 average shared memory size
0 average unshared data size
0 average unshared stack size
6517 page reclaims
0 page faults
0 swaps
0 block input operations
0 block output operations
0 messages sent
0 messages received
0 signals received
1 voluntary context switches
862 involuntary context switches
Compare to gunzip
. The max RSS is only 870K when using gunzip
.
$ /usr/bin/time -l gunzip 1G.gz
1.58 real 1.23 user 0.28 sys
872448 maximum resident set size
0 average shared memory size
0 average unshared data size
0 average unshared stack size
236 page reclaims
0 page faults
0 swaps
0 block input operations
0 block output operations
0 messages sent
0 messages received
0 signals received
138 voluntary context switches
176 involuntary context switches
Ref: opal/opal#2463
The issue can't be reproduced reliably, the most reliable way is with the following code (you may need to launch it a couple of times) WARNING: it forks a lot of processes:
[user@localhost opal]# ruby _testzlib.rb
_testzlib.rb:5:in `gzip': buffer error (Zlib::BufError)
from _testzlib.rb:5:in `<main>'
[user@localhost opal]# _testzlib.rb:5:in `gzip': buffer error (Zlib::BufError)
from _testzlib.rb:5:in `<main>'
[user@localhost opal]#
[user@localhost opal]# cat _testzlib.rb
require 'zlib'
10.times { fork }
Zlib.gzip("sdfasfasdfasdfasdasdfadfasdf" * 10000)
[user@localhost opal]#
The issue has been seen on opal's CI since Ruby 2.6 until 3.1.
Minimal reproducible script:
require "securerandom"
require "stringio"
require "zlib"
content = SecureRandom.base64(5000)
gzipped = Zlib.gzip(content)
thr = Thread.new do
loop do
Zlib::GzipReader.new(StringIO.new(gzipped)).read
end
end
loop do
thr.wakeup
end
leads to
#<Thread:0x000000010511d090 gunzip.rb:8 run> terminated with exception (report_on_exception is true):
gunzip.rb:10:in `initialize': buffer error (Zlib::BufError)
from gunzip.rb:10:in `new'
from gunzip.rb:10:in `block (2 levels) in <main>'
from gunzip.rb:9:in `loop'
from gunzip.rb:9:in `block in <main>'
gunzip.rb:15:in `wakeup': killed thread (ThreadError)
from gunzip.rb:15:in `block in <main>'
from gunzip.rb:14:in `loop'
from gunzip.rb:14:in `<main>'
The error doesn't happen, however, if we change Zlib::GzipReader.new(StringIO.new(gzipped)).read
to Zlib.gunzip(gzipped)
, but still happens with Zlib.gzip(content)
.
Probably related to #49
A new version of zlib 1.2.12 is available to address a bug that can crash deflate on some input when using Z_FIXED. Any plans to upgrade the gem with latest version of zlib?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.