Code Monkey home page Code Monkey logo

py7zr's Introduction

logo py7zr -- a 7z library on python

https://readthedocs.org/projects/py7zr/badge/?version=latest https://img.shields.io/pypi/dd/py7zr https://img.shields.io/conda/vn/conda-forge/py7zr https://dev.azure.com/miurahr/github/_apis/build/status/miurahr.py7zr?branchName=master https://coveralls.io/repos/github/miurahr/py7zr/badge.svg?branch=master https://img.shields.io/pypi/l/py7zr py7zr

py7zr is a library and utility to support 7zip archive compression, decompression, encryption and decryption written by Python programming language.

Discussion Forum

You are welcome to join discussions on project forum/builtin-board at https://github.com/miurahr/py7zr/discussions

You can see announcements of new releases, questions and answers, and new feature ideas. When you doubt for usage of py7zr library with unclear manuals, please feel easy to raise question on forum.

Security Notice

Please find a Security Policy of this project.

Version 0.20.0, 0.19.0, 0.18.10 or before has a vulnerability for path traversal attack. Details are on "CVE-2022-44900: path traversal vulnerability in py7zr" disclose article .

Affected versions are vulnerable to Directory Traversal due to insufficient checks in the 'py7zr.py' and 'helpers.py' files

You are recommend to update immediately to version 0.20.2 or later, 0.19.2 or 0.18.12

I really appreciate Mr. Matteo Cosentino for notification and corporation on security improvement.

Compression algorithms

py7zr supports algorithms and filters which lzma module and liblzma support, and supports BZip2 and Deflate that are implemented in python core libraries, It also supports ZStandard, Brotli and PPMd with third party libraries.

py7zr is also able to encrypt and decrypt data using 3rd party encryption library.

Supported algorithms

  • compress
    • LZMA2
    • LZMA
    • Bzip2
    • Deflate
    • Copy
    • ZStandard
    • Brotli
    • PPMd
    • Enhanced Deflate (Experimental)
  • crypt
    • 7zAES
  • Filters
    • Delta
    • BCJ(X86,ARMT,ARM,PPC,SPARC,IA64)

Note

  • A feature handling symbolic link is basically compatible with p7zip implementation, but not work with original 7-zip because the original does not implement the feature.
  • py7zr try checking symbolic links strictly and raise ValueError when bad link is requested, but it does not guarantee to block all the bad cases.
  • ZStandard and Brotli is not default methods of 7-zip, so these archives are considered not to be compatible with original 7-zip on windows/p7zip on linux/mac.
  • Enhanced Deflate is also known as DEFLATE64 TM that is a registered trademark of PKWARE, Inc.
  • Enhanced Deflate is tested only on CPython. It is disabled on PyPy.

Not supported algorithms

Install

You can install py7zr as usual other libraries using pip.

$ pip install py7zr

OR, alternatively using conda:

$ conda install -c conda-forge py7zr

Documents

User manuals

Developer guide

CLI Usage

You can run command script py7zr like as follows;

  • List archive contents
$ py7zr l test.7z
  • Extract archive
$ py7zr x test.7z
  • Extract archive with password
$ py7zr x -P test.7z
  password?: ****
  • Create and compress to archive
$ py7zr c target.7z test_dir
  • Create multi-volume archive
$ py7zr c -v 500k target.7z test_dir
  • Test archive
$ py7zr t test.7z
  • Append files to archive
$ py7zr a test.7z test_dir
  • Show information
$ py7zr i
  • Show version
$ py7zr --version

SevenZipFile Class Usage

py7zr is a library which can use in your python application.

Decompression/Decryption

Here is a code snippet how to decompress some file in your application.

import py7zr

archive = py7zr.SevenZipFile('sample.7z', mode='r')
archive.extractall(path="/tmp")
archive.close()

You can also use 'with' block because py7zr provide context manager(v0.6 and later).

import py7zr

with py7zr.SevenZipFile('sample.7z', mode='r') as z:
    z.extractall()

with py7zr.SevenZipFile('target.7z', 'w') as z:
    z.writeall('./base_dir')

py7zr also supports extraction of single or selected files by 'extract(targets=['file path'])'. Note: if you specify only a file but not a parent directory, it will fail.

import py7zr
import re

filter_pattern = re.compile(r'<your/target/file_and_directories/regex/expression>')
with py7zr.SevenZipFile('archive.7z', 'r') as archive:
    allfiles = archive.getnames()
    selective_files = [f for f in allfiles if filter_pattern.match(f)]
    archive.extract(targets=selective_files)

py7zr support an extraction of password protected archive.(v0.6 and later)

import py7zr

with py7zr.SevenZipFile('encrypted.7z', mode='r', password='secret') as z:
    z.extractall()

Compression/Encryption

Here is a code snippet how to produce archive.

import py7zr

with py7zr.SevenZipFile('target.7z', 'w') as archive:
    archive.writeall('/path/to/base_dir', 'base')

To create encrypted archive, please pass a password.

import py7zr

with py7zr.SevenZipFile('target.7z', 'w', password='secret') as archive:
    archive.writeall('/path/to/base_dir', 'base')

To create archive with algorithms such as zstandard, you can call with custom filter.

import py7zr

my_filters = [{"id": py7zr.FILTER_ZSTD}]
another_filters = [{"id": py7zr.FILTER_ARM}, {"id": py7zr.FILTER_LZMA2, "preset": 7}]
with py7zr.SevenZipFile('target.7z', 'w', filters=my_filters) as archive:
    archive.writeall('/path/to/base_dir', 'base')

shutil helper

py7zr also support shutil interface.

from py7zr import pack_7zarchive, unpack_7zarchive
import shutil

# register file format at first.
shutil.register_archive_format('7zip', pack_7zarchive, description='7zip archive')
shutil.register_unpack_format('7zip', ['.7z'], unpack_7zarchive)

# extraction
shutil.unpack_archive('test.7z', '/tmp')

# compression
shutil.make_archive('target', '7zip', 'src')

Requirements

py7zr uses a python3 standard lzma module for extraction and compression. The standard lzma module uses liblzma that support core compression algorithm of 7zip.

Minimum required version is Python 3.7.

py7zr tested on Linux, macOS, Windows and Ubuntu aarch64.

It hopefully works on M1 Mac too.

Recommended versions are:

  • CPython 3.7.5, CPython 3.8.0 and later.
  • PyPy3.7-7.3.3 and later.

Following fixes are included in these versions, and it is not fixed on python3.6.

  • BPO-21872: LZMA library sometimes fails to decompress a file
  • PyPy3-3090: lzma.LZMADecomporessor.decompress does not respect max_length
  • PyPy3-3242: '_lzma_cffi' has no function named 'lzma_stream_encoder'

Following improvements are included in CPython 3.10

  • BPO-41486: Faster bz2/lzma/zlib via new output buffering

Dependencies

There are several dependencies to support algorithms and CLI expressions.

Package Purpose
PyCryptodomex 7zAES encryption
PyZstd ZStandard compression
PyPPMd PPMd compression
Brotli Brotli compression (CPython)
BrotliCFFI Brotli compression (PyPy)
inflate64 Enhanced deflate compression
pybcj BCJ filters
multivolumefile Multi-volume archive read/write
texttable CLI formatter

Performance

You can find a compression and decompression benchmark results at [Github issue](#297) and [wiki page](https://github.com/miurahr/py7zr/wiki/Benchmarks)

py7zr works well, but slower than 7-zip and p7zip C/C++ implementation by several reasons. When compression/decompression speed is important, it is recommended to use these alternatives through subprocess.run python interface.

py7zr consumes some memory to decompress and compress data. It requires about 300MiB - 700MiB free memory to work well at least.

Use Cases

  • aqtinstall Another (unofficial) Qt (aqt) CLI Installer on multi-platforms.
  • PreNLP Preprocessing Library for Natural Language Processing
  • mlox a tool for sorting and analyzing Morrowind plugin load order

License

  • Copyright (C) 2019-2024 Hiroshi Miura
  • pylzma Copyright (c) 2004-2015 by Joachim Bauch
  • 7-Zip Copyright (C) 1999-2010 Igor Pavlov
  • LZMA SDK Copyright (C) 1999-2010 Igor Pavlov

This library is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2.1 of the License, or (at your option) any later version.

This library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details.

You should have received a copy of the GNU Lesser General Public License along with this library; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA

py7zr's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

py7zr's Issues

Extract file to buffer

Is your feature request related to a problem? Please describe.
I need to get file from archive and work with it in memory. Now first I should extract file into temporary folder and then read extracted file. So there are two io: first write to disk, second read from disk uncompressed file.

Describe the solution you'd like
Some way to extract file into io.BytesIO() without write/read temporary file to disk.

Describe alternatives you've considered
Or archive could be extracted into dict of io.BytesIO(), where keys are filepath and values are io.BytesIO()

Additional context
None

Make the API more like core libraries

Is your feature request related to a problem? Please describe.
The current API is not bad, although I do think there's some confusion when dealing with the list of file information objects the archive keeps internally. However, given Python already has a few libraries to deal with file compression/decompression/archival, such as zipfile and tarfile, and the fact that there are other 3rd party libraries that use a similar API as well, such as rarfile, pursuing something similar in this project might be something to consider.

Given the fact Python 2.x has been EOL'd, future versions of this library might just target Python 3, which will make things a lot easier when pursuing such change.

Describe the solution you'd like
Adapting the API in order to make it more like zipfile and tarfile.

Describe alternatives you've considered
There are no current alternative solutions capable of handling 7z files in the way I want.

Additional context
Python's zipfile library
Python's tarfile library
rarfile library

I'm pretty sure you are familiar with these, but I am not sure if not using a similar API was a deliberate choice or not.

Does it need internal queue(bringbuf)?

From converge report, there are no record a code block that is run when we have data in queue(queue.len > 0).

It may indicate we don't need queue and a queue in lzma.decompressor is enough.

OverflowError in archiveinfo.py

Traceback (most recent call last):
  File "main.py", line 112, in <module>
    archive.close()
  File "/home/fade/PycharmProjects/mbPatchCreator/venv/lib/python3.7/site-packages/py7zr/py7zr.py", line 764, in close
    self._write_archive()
  File "/home/fade/PycharmProjects/mbPatchCreator/venv/lib/python3.7/site-packages/py7zr/py7zr.py", line 579, in _write_archive
    encoded=self.encoded_header_mode)
  File "/home/fade/PycharmProjects/mbPatchCreator/venv/lib/python3.7/site-packages/py7zr/archiveinfo.py", line 977, in write
    self.files_info.write(file)
  File "/home/fade/PycharmProjects/mbPatchCreator/venv/lib/python3.7/site-packages/py7zr/archiveinfo.py", line 854, in write
    self._write_times(file, Property.CREATION_TIME, 'creationtime')
  File "/home/fade/PycharmProjects/mbPatchCreator/venv/lib/python3.7/site-packages/py7zr/archiveinfo.py", line 775, in _write_times
    write_byte(fp, (num_defined * 8 + 2).to_bytes(1, byteorder='little'))
OverflowError: int too big to convert

The biggest file is a 7.7mb .war application. In sum the uncompressed files are 9.6mb.

Issue in close

it just self.fp.close() but it should call self._fp_close() that is thread safe and check reference counter.

Bug: wrong emptystream flag settings

Describe the bug

When compressing dicectories which number is >8 or >16, py7zr produce wrong emptystream boolean field for that.

Related issue
#46

To Reproduce

  1. create test case which source directory has many(>8) directories and no files
  2. compress using py7zr API
  3. check emptystream field of header that should be all True but some is False

Expected behavior
emptystream field indicate all True

I'll push test case as unit test.

Support "Split to Volumes" 7zip feature

On 7zip, we can Split to Volumes the target file/folder we want to put in 7zip using -v

Here the documentation Parameters -v{Size}[b | k | m | g] Specifies volume sizes in Bytes, Kilobytes (1 Kilobyte = 1024 bytes), Megabytes (1 Megabyte = 1024 Kilobytes) or Gigabytes (1 Gigabyte = 1024 Megabytes). If you specify only {Size}, 7-zip will treat it as bytes.

It's possible to specify several values. Eg; 10k 15k 2m creates three volumes; 10KB, 15KB, and 2MB in size.

Thanks in advance

CLI: Passing password interactively

Is your feature request related to a problem? Please describe.
When decryption and encryption, user need to pass a password to program. Command line option can be seen by another process, that is sometimes a problem for security.
It is better to provide a way to pass password through interactive session, which is as same as p7zip does.

Describe the solution you'd like
When launch py7zr -P <archive> on terminal then process prompt password?: and wait input.

Explanation about symbolic link

Is your feature request related to a problem? Please describe.

v0.5b3 supports symbolic link archive wihch is as same as p7zip does.
A link extension is not implemented by original 7zip .

Describe the solution you'd like

We need to explain a situation and limitation on README and documentation.

Archive with links like symlinks stores links with absolute path

Describe the bug
I don't is this bug or feature, but for archive with links like symlinks if file is not in same folder, then it stores links with absolute path.

To Reproduce

  1. Prepare test data:
import os, pathlib

parent_path_drive = "X:/"
parent_path = os.path.join(parent_path_drive, "symb")

with open(os.path.join(parent_path, "Original1.txt"), "w") as f:
    f.write("real Original1.txt")

s = pathlib.Path(os.path.join(parent_path, "rel/path/link_to_Original1.txt"))
s.parent.mkdir(parents=True, exist_ok=True)
s.symlink_to(os.path.join(parent_path, "Original1.txt"), False)

s = pathlib.Path(os.path.join(parent_path, "rel/path/link_to_link_Original1.txt"))
s.parent.mkdir(parents=True, exist_ok=True)
s.symlink_to(os.path.join(parent_path, "rel/path/link_to_Original1.txt"), False)

s = pathlib.Path(os.path.join(parent_path, "rel/path/link_to_link_to_link_Original1.txt"))
s.parent.mkdir(parents=True, exist_ok=True)
s.symlink_to(os.path.join(parent_path, "rel/path/link_to_link_Original1.txt"), False)

s = pathlib.Path(os.path.join(parent_path, "rel/link_to_link_to_link_Original1.txt"))
s.parent.mkdir(parents=True, exist_ok=True)
s.symlink_to(os.path.join(parent_path, "rel/path/link_to_link_Original1.txt"), False)

s = pathlib.Path(os.path.join(parent_path, "a/rel64"))
s.parent.mkdir(parents=True, exist_ok=True)
s.symlink_to(os.path.join(parent_path, "rel"), True)


s = pathlib.Path(os.path.join(parent_path, "lib/Original2.txt"))
s.parent.mkdir(parents=True, exist_ok=True)
with open(os.path.join(parent_path, "lib/Original2.txt"), "w") as f:
    f.write("real Original2.txt")

s = pathlib.Path(os.path.join(parent_path, "lib/Original2.[1.2.3].txt"))
s.parent.mkdir(parents=True, exist_ok=True)
s.symlink_to(os.path.join(parent_path, "lib/Original2.txt"), False)

s = pathlib.Path(os.path.join(parent_path, "lib/Original2.[1.2].txt"))
s.parent.mkdir(parents=True, exist_ok=True)
s.symlink_to(os.path.join(parent_path, "lib/Original2.[1.2.3].txt"), False)

s = pathlib.Path(os.path.join(parent_path, "lib/Original2.[1].txt"))
s.parent.mkdir(parents=True, exist_ok=True)
s.symlink_to(os.path.join(parent_path, "lib/Original2.[1.2].txt"), False)

s = pathlib.Path(os.path.join(parent_path, "lib64"))
s.symlink_to(os.path.join(parent_path, "lib"), True)


s = pathlib.Path(os.path.join(parent_path_drive, "Original3.txt"))
s.parent.mkdir(parents=True, exist_ok=True)
with open(os.path.join(parent_path_drive, "Original3.txt"), "w") as f:
    f.write("real Original3.txt")

s = pathlib.Path(os.path.join(parent_path, "Original3.[1].txt"))
s.parent.mkdir(parents=True, exist_ok=True)
s.symlink_to(os.path.join(parent_path_drive, "Original3.txt"), False)
  1. Run following code with python3.
import py7zr, os

os.chdir(parent_path)
archive = py7zr.SevenZipFile(os.path.join(parent_path_drive, "symb_2.7z"), 'w')
archive.writeall('', '')
archive.close()
  1. symb_2.7z is produced
    file in archive symb_2_2.zip

  2. open archive in 7zip and view links which saved into body of files
    for file "Original3.[1].txt" link is to "\?\X:\Original3.txt"
    for folder "a\rel64" link is to "\?\X:\symb\rel"
    and etc.

Expected behavior

  1. There should be option to write files instead of links (I can't found)
  2. If files are in archive there should be relative path to this files ["Original1.txt" in thise issue]
  3. If original files are in external folder ["Original3.txt" in thise issue] then should write real files instead of links (another option).
  4. Add such archive to test/data to test symlinks/hardlinks/junction with different file/folder locations

Environment (please complete the following information):

  • OS: [Windows 10]
  • Python 3.8.2 | packaged by conda-forge | (default, Mar 5 2020, 17:29:01) [MSC v.1916 64 bit (AMD64)] on win32
  • py7zr version: [v0.6, commit #1e38828 on master]

Unable to extract 7z archives with long paths

If you create any 7z archive that has a path over 255 characters(8.3 naming convention). The program will fail to unpack them and give a "_lzma.LZMAError". If the archive contains some files that aren't in a ton of subdirectories to the point where the file falls into the 8.3 naming convention it will extract those fine it is just when it comes across a file with a 7zip file path of more than 255 characters.

Improve documents

Is your feature request related to a problem? Please describe.
There is a document in docs directory but it is poor now.

Describe the solution you'd like
Add more description, especially about decryption and compression.

Feature: Add support for encrypted archives

Is your feature request related to a problem? Please describe.
n/a

Describe the solution you'd like
Fully support compressing and decompressing with password.

Describe alternatives you've considered
n/a

Additional context
n/a

Supporting other compression methods (BZip2)

Is your feature request related to a problem? Please describe.
7z has open architecture, so it can support any new compression methods: LZMA, LZMA2, PPMD, BCJ, BCJ2, BZip2. (source: https://www.7-zip.org/7z.html)

I have a file which was compressed with BZip2 compression method and has the following properties:

Path = FILE_WITH_BZip2.7z
Type = 7z
Physical Size = 716637328
Headers Size = 146
Method = BZip2
Solid = -
Blocks = 1

I tried to decompress this file with the latest v0.6a1 release and I got the following error:

Traceback (most recent call last):
File "test.py", line 17, in
print(archive.test())
File "/home/amarkus/.pyenv/versions/3.7.3/lib/python3.7/site-packages/py7zr/py7zr.py", line 688, in test
return self._test_digests()
File "/home/amarkus/.pyenv/versions/3.7.3/lib/python3.7/site-packages/py7zr/py7zr.py", line 528, in _test_digests
if self._test_unpack_digest():
File "/home/amarkus/.pyenv/versions/3.7.3/lib/python3.7/site-packages/py7zr/py7zr.py", line 520, in _test_unpack_digest
self.worker.extract(self.fp) # TODO: print progress
File "/home/amarkus/.pyenv/versions/3.7.3/lib/python3.7/site-packages/py7zr/compression.py", line 277, in extract
self.src_start + positions[i + 1])
File "/home/amarkus/.pyenv/versions/3.7.3/lib/python3.7/site-packages/py7zr/compression.py", line 307, in extract_single
self.decompress(fp, f.folder, fileish, f.uncompressed[-1], f.compressed, src_end)
File "/home/amarkus/.pyenv/versions/3.7.3/lib/python3.7/site-packages/py7zr/compression.py", line 335, in decompress
raise DecompressionError
py7zr.exceptions.DecompressionError

Describe the solution you'd like
It would be great if py7zr package would support BZip2 compression method for file decompression.

Describe alternatives you've considered
I have tried to decompress this file in python with bz2 package but it failed. Im open to any other suggestion.

import bz2

filepath = '/home/amarkus/Downloads/FILE_WITH_BZip2.7z'
zipfile = bz2.BZ2File(filepath) # open the file
data = zipfile.read() # get the decompressed data

I got the following error:

Traceback (most recent call last):
File "test.py", line 5, in
data = zipfile.read() # get the decompressed data
File "/home/amarkus/.pyenv/versions/3.7.3/lib/python3.7/bz2.py", line 178, in read
return self._buffer.read(size)
File "/home/amarkus/.pyenv/versions/3.7.3/lib/python3.7/_compression.py", line 103, in read
data = self._decompressor.decompress(rawblock, size)
OSError: Invalid data stream

Additional context
I dont have any

py7zr extraction error

py7zr failed with following error when extracting qt archive.

  File "/opt/hostedtoolcache/Python/3.6.8/x64/lib/python3.6/site-packages/py7zr/py7zr.py", line 225, in extractall
    self._set_file_property(o, p)
  File "/opt/hostedtoolcache/Python/3.6.8/x64/lib/python3.6/site-packages/py7zr/py7zr.py", line 166, in _set_file_property
    os.utime(outfilename, times=(creationtime, creationtime))
FileNotFoundError: [Errno 2] No such file or directory

Invalid 7z archive created

As mentioned in #41 is here a testcase for the broken archive:

https://github.com/fdellwing/py7zr-testcase

I removed the content of the files and removed confidential information.

> ./run_test.py 

7-Zip [64] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21
p7zip Version 16.02 (locale=de_DE.UTF-8,Utf16=on,HugeFiles=on,64 bits,12 CPUs Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz (906EA),ASM,AES-NI)

Scanning the drive for archives:
1 file, 3171 bytes (4 KiB)

Listing archive: test.7z


ERROR: test.7z : test.7z
Open ERROR: Can not open the file as [7z] archive


ERRORS:
Unsupported feature
WARNINGS:
Unsupported feature


Errors: 1

produced 7zip archive has broken header

test case 'test_py7zr_write_single_close' produces a file named 'target.7z'. When running '7z l target.7z' output said its header has ERROR as follows:

$ 7z l pytest-current/test_py7zr_write_single_close0/target.7z 

7-Zip [64] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21
p7zip Version 16.02 (locale=ja_JP.UTF-8,Utf16=on,HugeFiles=on,64 bits,4 CPUs Intel(R) Core(TM) i3-6100U CPU @ 2.30GHz (406E3),ASM,AES-NI)

Scanning the drive for archives:
1 file, 120 bytes (1 KiB)

Listing archive: pytest-current/test_py7zr_write_single_close0/target.7z

--
Path = pytest-current/test_py7zr_write_single_close0/target.7z
Type = 7z
ERRORS:
Headers Error
Physical Size = 120
Headers Size = 83
Solid = -
Blocks = 0

   Date      Time    Attr         Size   Compressed  Name
------------------- ----- ------------ ------------  ------------------------
------------------- ----- ------------ ------------  ------------------------
                                     0            0  0 files

Errors: 1

archiveinfo() UnsupportedCompressionMethodError with BCJ archive

Describe the bug
py7zr throws UnsupportedCompressionMethodError when opening a file and retrieving .archiveinfo()

To Reproduce
Create a BCJ archive: 7z a -mf=BCJ test.7z test.txt (create some test.txt before...)
Verify with 7z that it's BCJ - my archive was LZMA:12k BCJ.
Open file with 7zr and use archiveinfo().

import py7zr
archive = py7zr.SevenZipFile('test.7z', 'r').archiveinfo()

Crash:

File "...\py7zr\compression.py", line 433, in get_methods_names
    methods_names.append(methods_name_map[coder['method']])
KeyError: b'\x03\x03\x01\x03'

Environment (please complete the following information):

Inefficient buffer handling

Describe the bug
This is non-feature, but performance issue.

py7zr/py7zr/compression.py

Lines 131 to 151 in 9a96bf7

elif len(data) == 0: # action padding
self.flushded = True
padlen = 16 - len(self.buf) % 16
inp = self.buf + bytes(padlen)
self.buf = b''
temp = self.cipher.decrypt(inp)
return self.lzma_decompressor.decompress(temp, max_length)
else:
compdata = self.buf + data
currentlen = len(compdata)
a = currentlen // 16
nextpos = a * 16
if currentlen == nextpos:
self.buf = b''
temp = self.cipher.decrypt(compdata)
return self.lzma_decompressor.decompress(temp, max_length)
else:
self.buf = compdata[nextpos:]
assert len(self.buf) < 16
temp = self.cipher.decrypt(compdata[:nextpos])
return self.lzma_decompressor.decompress(temp, max_length)

Here is quite inefficient buffer handling, bytes connect, and slice access. It can lead a low performance of decryption of archive.

Bug in _filelist_retrieve and extractall

Describe the bug
This bug is present when main_streams is None. Check against None is needed.
image

and in extractall
image

Environment (please complete the following information):

  • OS: Windows 10
  • Python 3.8
  • py7zr version: v0.5b5

Test data(please attach in the report):
test_data.zip

provide method for extracting specific files from an archive

Is your feature request related to a problem? Please describe.
For a serverless batch process, I have big 7z files floating around (2.5Gb+), which I don't want to extract completely (30Gb+).
Otherwise the serverless aproach would be very expensive.

Describe the solution you'd like
Instead I want to selectively extract files.
The argument of the method could be of a list of strings, which should match with the given file list of the archive and only extract the matches.
This would enable a leaner solution and focus on the files I am interested in.

Describe alternatives you've considered
There are currently no alternatives, because I already tried to do it via the 7z binary provisioning via copying the binaries + libraries. :)
It happens, that there is a permission denied error - regardless of what I am doing.
The support also couldn't assist here.

Test test_zerosize2 fails with py7zr\py7zr.py:373: FileNotFoundError

[CI test]9https://ci.appveyor.com/project/miurahr/py7zr/build/job/98iib8pdxy0brq8n) fails with a following error on topic-zerofile branch ;

_______________________________ test_zerosize2 ________________________________
    @pytest.mark.files
    def test_zerosize2():
        archive = py7zr.SevenZipFile(open(os.path.join(testdata_path, 'test_6.7z'), 'rb'))
        tmpdir = tempfile.mkdtemp()
>       archive.extractall(path=tmpdir)
tests\test_advanced.py:160: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
py7zr\py7zr.py:607: in extractall
    self._set_file_property(o, p)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <py7zr.py7zr.SevenZipFile object at 0x000000BC8749AF98>
outfilename = 'C:\\Users\\appveyor\\AppData\\Local\\Temp\\1\\tmpi8fz4sea\\5.13.0/msvc2017_64/include/QtScript/qtscript-config.h'
properties = {'archivable': True, 'attributes': 1048608, 'compressed': 0, 'creationtime': ArchiveTimestamp(132052647074275978), ...}
    def _set_file_property(self, outfilename: str, properties: Dict[str, Any]) -> None:
        # creation time
        creationtime = ArchiveTimestamp(properties['lastwritetime']).totimestamp()
        if creationtime is not None:
>           os.utime(outfilename, times=(creationtime, creationtime))
E           FileNotFoundError: [WinError 2] The system cannot find the file specified: 'C:\\Users\\appveyor\\AppData\\Local\\Temp\\1\\tmpi8fz4sea\\5.13.0/msvc2017_64/include/QtScript/qtscript-config.h'
py7zr\py7zr.py:373: FileNotFoundError

Feature: Test integrity of archive

7z has an option 't' to test integrity of archive.
It is better to have a functionality in library and an option 'i' for 'py7zr' command line.

ArchiveInfo.solid - Wrong result

Describe the bug
ArchiveInfo.solid gives wrong result.

To Reproduce
I created an archive of 12 jpeg files, for a total of 32 MB with 7z 19.00, compression level: Fastest, Solid Block Size: 8 MB.
I verified in 7z that the archive is indeed solid (Info button -> solid had a +).

import py7zr
sz = py7zr.SevenZipFile(path)
print(sz.archiveinfo().solid)

Returns False for those archives, despite them being solid.

Environment (please complete the following information):

  • OS: [W7x64]
  • Python [Python 3.8.1 x64]
  • py7zr version: [py7zr-0.5-py3-none-any from pip]

Not all resources are released on close() method

Describe the bug
SevenZipFile.close() closes file object opened but does not release header structure and files list.
It should also release header structure to prevent memory leakage.

Related issue
#77

Expected behavior
Release all resources SevenZipFile has been taken.

multi-folder and encrypted archive extraction error

Describe the bug
There is a bug extracting a multi-folder and encrypted archive.
test case 'test_extract_encrypted_2' become error.

Process SpawnProcess-16:
Traceback (most recent call last):
  File "/Users/runner/hostedtoolcache/Python/3.6.10/x64/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/Users/runner/hostedtoolcache/Python/3.6.10/x64/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/Users/runner/runners/2.164.8/work/1/s/py7zr/compression.py", line 206, in extract_single
    self.decompress(fp, f.folder, ofp, f.uncompressed[-1], f.compressed, src_end)
  File "/Users/runner/runners/2.164.8/work/1/s/py7zr/compression.py", line 231, in decompress
    tmp = decompressor.decompress(inp, max_length)
  File "/Users/runner/runners/2.164.8/work/1/s/py7zr/compression.py", line 352, in decompress
    folder_data = self.decompressor.decompress(data, max_length=max_length)
  File "/Users/runner/runners/2.164.8/work/1/s/py7zr/compression.py", line 151, in decompress
    return self.lzma_decompressor.decompress(temp, max_length)
_lzma.LZMAError: Corrupt input data
Process SpawnProcess-17:
Traceback (most recent call last):
  File "/Users/runner/hostedtoolcache/Python/3.6.10/x64/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/Users/runner/hostedtoolcache/Python/3.6.10/x64/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/Users/runner/runners/2.164.8/work/1/s/py7zr/compression.py", line 206, in extract_single
    self.decompress(fp, f.folder, ofp, f.uncompressed[-1], f.compressed, src_end)
  File "/Users/runner/runners/2.164.8/work/1/s/py7zr/compression.py", line 231, in decompress
    tmp = decompressor.decompress(inp, max_length)
  File "/Users/runner/runners/2.164.8/work/1/s/py7zr/compression.py", line 352, in decompress
    folder_data = self.decompressor.decompress(data, max_length=max_length)
  File "/Users/runner/runners/2.164.8/work/1/s/py7zr/compression.py", line 151, in decompress
    return self.lzma_decompressor.decompress(temp, max_length)
_lzma.LZMAError: Corrupt input data

Related issue
#70

To Reproduce
run test case 'test_extract_encrypted_2'

Expected behavior
extract the archive correctly.

Bug: Support compression of zero sized files

Describe the bug
Produced archive become invalid.

Related issue
#46

To Reproduce
Steps to reproduce the behavior:

  1. Prepare test data by 'touch file'
  2. Compress it as test.7z with py7zr
  3. '7z l test.7z' become error.

Expected behavior
It should be an "emptystream=True" when archiving zero sized file but it is False now.

Environment (please complete the following information):

  • OS: Mint Linux
  • Python: 3.7
  • py7zr version: v0.5b3

Test data(please attach in the report):
A minimum test data to reproduce your problem.

Additional context
Add any other context about the problem here.

AsyncIO friendly implementation

Current py7zr is not thread safe and not asyncio friendly.
It prevent to use with multi-thread application and network application which uses aiohttp or asyncio.

It is great if py7zr become asyncio friendly.

lzma module in python core is not also thread-safe. so we may need some tricks to realize it.

Relative output path cause producing broken symbolic link

Describe the bug
In downstream issue report,
miurahr/aqtinstall#88

putting output path as relative directory, produced symbolic link is broken.

Related issue
N.A.

To Reproduce

Here is a test case and expected results

@pytest.mark.files
@pytest.mark.skipif(sys.platform.startswith("win"), reason="Normal user is not permitted to create symlinks.")
def test_extract_symlink_with_relative_target_path(tmp_path):
    archive = py7zr.SevenZipFile(open(os.path.join(testdata_path, 'symlink.7z'), 'rb'))
    os.chdir(tmp_path)
    os.makedirs(tmp_path.joinpath('target'))
    archive.extractall(path='target')
    assert os.readlink(tmp_path.joinpath('target/lib/libabc.so.1.2')) == 'libabc.so.1.2.3'

Expected behavior

above test failed with

>       assert os.readlink(tmp_path.joinpath('target/lib/libabc.so.1.2')) == 'libabc.so.1.2.3'
E       AssertionError: assert 'target/lib/libabc.so.1.2.3' == 'libabc.so.1.2.3'

Environment (please complete the following information):

  • OS: Linux

Test data(please attach in the report):
N.A.

Additional context
N.A.

Extract archive callback.

Is your feature request related to a problem? Please describe.
I need complute extract progress and process every file when one file extracted.

Describe the solution you'd like
Add callback param in extractall function.

Describe alternatives you've considered
None

Additional context
None

No password encrypted on write

I tried to write files to compressed archive an with password set but the files where not accessible.
code snip:
c7zip_file = py7zr.SevenZipFile(c7z_file_name, mode='w', password='somepassword')
for fname in compressed_files:
c7zip_file.write(fname)
c7zip_file.close()

Extract single file from archive

Is your feature request related to a problem? Please describe.
Sometimes just one file from a large archive must be extracted.

Describe the solution you'd like
Add extract_single method to SevenZipFile class.

Describe alternatives you've considered
None

Additional context
None

Test case github_14_multi is sometimes failed

Some test is failed randomly.
This is because concurrent extraction cannot guarantee an order of extraction.


155    @pytest.mark.files
156    def test_github_14_multi():
157        """ multiple unnamed objects."""
158        archive = py7zr.SevenZipFile(open(os.path.join(testdata_path, 'github_14_multi.7z'), 'rb'))
159        assert archive.getnames() == ['github_14_multi', 'github_14_multi']
160        tmpdir = tempfile.mkdtemp()
161        archive.extractall(path=tmpdir)
162>       assert open(os.path.join(tmpdir, 'github_14_multi'), 'rb').read() == bytes('Hello GitHub issue #14 2/2.\n', 'ascii')
163E       AssertionError: assert b'Hello GitHu...ue #14 1/2.\n' == b'Hello GitHub...ue #14 2/2.\n'
164E         At index 23 diff: 49 != 50
165E         Full diff:
166E         - b'Hello GitHub issue #14 1/2.\n'
167E         ?                          ^
168E         + b'Hello GitHub issue #14 2/2.\n'
169E         ?                          ^
170
171tests\test_advanced.py:65: AssertionError
172
173

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.