vstinner / hachoir Goto Github PK
View Code? Open in Web Editor NEWHachoir is a Python library to view and edit a binary stream field by field
Home Page: http://hachoir.readthedocs.io/
License: GNU General Public License v2.0
Hachoir is a Python library to view and edit a binary stream field by field
Home Page: http://hachoir.readthedocs.io/
License: GNU General Public License v2.0
version 3.0a3
Tried parsing and reading metadata of TGA image, received following warning:
[warn] Skip parser 'TargaFile': Unknown bits/pixel value 32
As titled is there any way to add new fields in the file metadata?
example not working
# ....
parser = createParser(file_content)
md5_sum = md5(file_content.getbuffer()).hexdigest()
metadata = extractMetadata(parser)
if metadata:
md5_field = MissingField("md5", str(md5_sum))
metadata.add(md5_field)
metadata = self._list_to_dict(metadata.exportPlaintext())
return metadata
Hachoir (from Debian 3.1.0+dfsg-3) can parse the structure of a FAT image created by mkfs.msdos
(showing first 0x200 bytes as Boot
, then both FAT copies, and root directory), but on an image created by mtools (specifically mformat -i "test-mtools.img" ::.
) it reports a MasterBootRecord
partition table followed by only RawBytes
, whereas those should be identified as a FAT.
Could this be the presence of a non-empty partition table causing 2 different dissectors to be used?
W0631: Using possibly undefined loop variable 'index' (undefined-loop-variable)
hachoir/hachoir/regex/regex.py
Line 506 in 4403b0c
At first glance the problem is here, because self.parent_.name_
is going to be something like SimpleBlock[0]
.
Hi, I've noticed that the DateTime object returned by extractMetadata(parser).get('creation_date')
does not contain timezone information, and I was not able to find it anywhere else in the extracted metadata. Is this information actually stored in the movie file and the library is ignoring it?
followup:
https://stackoverflow.com/questions/45107320/parsing-an-iso-file-with-hachoir
is it possible to iterate over these hachoir.parser.file_system.iso9660.Volume
items and list the filenames inside the iso file?
Some posts on Stackoverflow suggested to use hachoir-wx
.
Those were linked to bitbucket: https://bitbucket.org/haypo/hachoir/wiki/hachoir-wx - this repository is down.
Then according to https://directory.fsf.org/wiki/Hachoir_project-_hachoir_wx there was a website http://www.hachoir.org/wiki/hachoir-wx that is also down.
So is this here now the offical repo is is this some fork project?
I tried to run hachoir-wx
and nothing happend after pythonw started. May I suggest you add at least some print output if wxPython is not installed to have the user install it?
I'm attempting to extract the Microsoft Core Fonts for the Web in Python. From the How to extract a windows cabinet file in python StackOverflow question, I learned about hachoir.
hachoir happily parses the self-extracting exe file, and I was able to extract something (/section_rsrc
stream) that is happily accepted by cabextract. However, hachoir won't parse it:
>>> cab = createParser('rsrc.cab')
[warn] Skip parser 'CabFile': Invalid magic
Stripping all data before the CAB header using a hex editor, I can get hachoir to parse the CAB file. Does hachoir offer the means to extract this CAB file, without leading/trailing data? Or is that something that I need to look up in a Microsoft specification document?
Same question for extracting the files from the CAB file; does hachoir offer the abstraction level to do this?
Thanks!
hachoir/parser/guess.py
def createParser(filename, real_filename=None, tags=None):
"""
Create a parser from a file or returns None on error.
Options:
- file (str|io.IOBase): Input file name or
a byte io.IOBase stream ;
- real_filename (str): Real file name.
"""
if not tags:
tags = []
stream = FileInputStream(filename, real_filename, tags=tags)
guess = guessParser(stream)
if guess is None:
stream.close()
return guess
You should return stream with guess. You should let us close stream.
λ hachoir-metadata
'hachoir-metadata' is not recognized as an internal or external command,
operable program or batch file.
but it's in PATH
λ which hachoir-metadata
/c/Users/JayXon/AppData/Local/Programs/Python/Python36/Scripts/hachoir-metadata
I have to do this:
λ python C:/Users/JayXon/AppData/Local/Programs/Python/Python36/Scripts/hachoir-metadata
Usage: hachoir-metadata [options] files
Options:
-h, --help show this help message and exit
--type Only display file type (description)
--mime Only display MIME type
--level=LEVEL Quantity of information to display from 1 to 9 (9 is
the maximum)
--raw Raw output
--bench Run benchmark
--force-parser=FORCE_PARSER
List all parsers then exit
--parser-list List all parsers then exit
--profiler Run profiler
--version Display version and exit
--quality=QUALITY Information quality (0.0=fastest, 1.0=best, and
default is 0.5)
--maxlen=MAXLEN Maximum string length in characters, 0 means unlimited
(default: 300)
--verbose Verbose mode
--debug Debug mode
I think the reason is that windows doesn't support #!
in script, you might have to create a batch script for this to work.
For example a hachoir-metadata.bat
file in the same directory with hachoir-metadata with something like this works for me.
@python %~dp0hachoir-metadata
Or a hachoir-metadata.bat
file like this which doesn't need hachoir-metadata script.
@python -c __import__('hachoir.metadata.main').metadata.main.main()
Could be convenient. I had to click the 'Build' button on ReadTheDocs before it reflected the most recent changes. Before that it was showing docs from a few months back.
It's been more than a year since last release, and there have been significant fixes since then, for example b547efa. Maybe it's time for a release please?
I am aware that this project is unmaintained, but my project depends on the fixes introduced following #65. I was planning on simply using an URL in my dependency specification (https://github.com/vstinner/hachoir/archive/8000dbeb9aad587e8dc5be8202796cdfb67f899e.zip), but PyPI will not accept such a dependency.
I was hoping you could release a new version of hachoir including these fixes, if you can find the time. Alternatively, I can fork the project and publish it as e.g. hachoir-reloaded. I'd rather not fork though, if that can be avoided.
The original Python2-only "Hachoir" project hosted on Bitbucket didn't get much love:
I propose to rename Hachoir3 to Hachoir:
Hopefully, releases of the old Python2-only Hachoir project can co-exist since they were published as a different name (hachoir-core, hachoir-metadata, etc.)
Command:
hachoir-metadata /..../00014.MTS
Expected result:
Metadata extracted correctly
Actual result:
[err!] [<MPEG_TS>] Hachoir can't extract metadata, but is able to parse: /..../00014.MTS
Sorry for the poor feature request.
At string_field.py:142, we find the following:
if not (1 <= nbytes <= 0xffff):
Empty strings occur often in real files - think for example, empty strings for constants (where I first encountered this issue). Although it seems odd to have a zero-length field, Hachoir seems to deal with this fine.
Any objections to me changing the line to
if not (0 <= nbytes <= 0xffff):
?
Ubuntu 16.04
Python 3.8
Flac file: https://files.catbox.moe/gafg3k.flac
file_hash: 138ae53711c6ec55ee88c9e8f54c846e469649c1bc16d5011786b1d70d143828
In [21]: with hachoir.parser.createParser("./gafg3k.flac") as parser:
...: result = hachoir.metadata.extractMetadata(parser)
...:
[warn] [/metadata] Duplicate field name Key 'stream_info' already exists
...
...
...
[warn] [/metadata] Duplicate field name Key 'stream_info' already exists---------------------------------------------------------------------------
UniqKeyError Traceback (most recent call last)
/usr/local/lib/python3.8/dist-packages/hachoir/field/generic_field_set.py in _addField(self, field)
193 try:
--> 194 self._fields.append(field._name, field)
195 except UniqKeyError as err:
/usr/local/lib/python3.8/dist-packages/hachoir/core/dict.py in append(self, key, value)
66 if key in self._index:
---> 67 raise UniqKeyError("Key '%s' already exists" % key)
68 self._index[key] = len(self._value_list)^C
UniqKeyError: Key 'stream_info' already exists
During handling of the above exception, another exception occurred:
KeyboardInterrupt Traceback (most recent call last)
<ipython-input-21-a7157485c236> in <module>
1 with hachoir.parser.createParser("./gafg3k.flac") as parser:
----> 2 result = hachoir.metadata.extractMetadata
eyboardInterrupt
I'm running:
Python 3.7.0 (default, Jul 23 2018, 20:22:55)
macOS 10.13.6 (17G65)
bash
I did:
pip3 install hachoir-wx
(which was successful)
But running it I get:
File "/usr/local/bin/hachoir-wx", line 27
print "%s version %s" % (PACKAGE, VERSION)
^
SyntaxError: invalid syntax
Any ideas? Thanks
A truncated jpeg can have a JpegImageData field with no terminator, which is created without a known size.
Because the size isn't known the corrupted JpegImageData must be parsed in full to calculate its size when the field is added to its parent JpegFile during JpegFile parsing. This forces simple operations that don't care about the JpegImageData, like checking if a field with a given name is in the JpegFile, to parse the corrupted JpegImageData fully. Parsing the corrupted section can blow up the memory use of the parser as it tries to parse the entire rest of the file in small chunks.
An example file that causes this issue can be found here: https://github.com/CybercentreCanada/assemblyline-service-characterize/issues/12.
This jpeg truncated to 500 000 bytes consumes approximately 1 GB of memory parsing JpegHuffmanUnits until it reaches the end of the file and errors. This happens when extracting metadata, or whenever a checking for a field name in the jpeg that isn't there.
The README currently says the library supports Python 3.3+ (EOL 2017-09-29). If minimum compatibility is set to at least 3.5 then we could make use of type annotations which can simplify docstrings and make the library more IDE-friendly. 3.6 would be a minor improvement over 3.5, since f
-strings offer a less verbose alternative to the %
formatting currently used in several places.
Hi, do you mind making those ☝️versions available on pypi, please?
Thanks in advance.
To reproduce:
C-e
to save itTraceback (most recent call last):
File "cmd.py", line 4, in <module>
main()
File "/home/chrahunt/.local/lib/python3.5/site-packages/hachoir/urwid_ui.py", line 831, in main
"display_value": values.display_value,
File "/home/chrahunt/.local/lib/python3.5/site-packages/hachoir/urwid_ui.py", line 740, in exploreFieldSet
ui.run_wrapper(run)
File "/usr/local/lib/python3.5/dist-packages/urwid/display_common.py", line 763, in run_wrapper
return fn()
File "/home/chrahunt/.local/lib/python3.5/site-packages/hachoir/urwid_ui.py", line 663, in run
e = top.keypress(size, e)
File "/usr/local/lib/python3.5/dist-packages/urwid/container.py", line 1116, in keypress
return self.footer.keypress((maxcol,),key)
File "/usr/local/lib/python3.5/dist-packages/urwid/container.py", line 1587, in keypress
key = self.focus.keypress(tsize, key)
File "/home/chrahunt/.local/lib/python3.5/site-packages/hachoir/urwid_ui.py", line 533, in keypress
self._done(self.get_edit_text())
File "/home/chrahunt/.local/lib/python3.5/site-packages/hachoir/urwid_ui.py", line 224, in <lambda>
raise NeedInput(lambda path: self.save_field(path, key == 'ctrl e'),
File "/home/chrahunt/.local/lib/python3.5/site-packages/hachoir/urwid_ui.py", line 388, in save_field
copyfileobj(stream.file(), os.fdopen(fd, 'wb'))
File "/usr/lib/python3.5/shutil.py", line 73, in copyfileobj
buf = fsrc.read(length)
File "/home/chrahunt/.local/lib/python3.5/site-packages/hachoir/stream/input.py", line 85, in read
if size is None or None < self._size < pos + size:
TypeError: unorderable types: NoneType() < int()
pointing to here.
The Pypi entry for https://pypi.python.org/pypi/hachoir-metadata/1.3.3 shows the BitBucket as http://bitbucket.org/haypo/hachoir/wiki/hachoir-metadata, which leads to a 404.
I don't know if it's an out-of-data project that was mimicking your hachoir-metadata or if it's your Pypi and it's got the wrong site. Either way, figured I'd let you know about it.
Edit: I found the same bug on http://hachoir3.readthedocs.io/. Perhaps it's a deprecated BitBucket account then?
When I process multiple files it often require command-line tools to create CSV/JSON pipeline. Right now hachoir-metadata tool can't be used to create CSV or JSON.
It will be very helpful if hachoir-metadata and other cmd tools will produce machine readable results.
I am working on a project where I need all the tags present in PNG/JPEG and other similar file formats. I found hachoir-urwid utility is doing the job when I open a file using hachoir-urwid it gives all the headers/tags/data, It does the annotation at every level.
My question is it gives the output in interactive manner, I have to press enter at every '+' to expand the inner details further.
Can someone please help me out, I am unable to find some utility which can directly give me the complete output in stdout instead of some interactive screen.
hachoir[urwid] requires urwid==1.3.1 (released on 2015-11-02)
This produces distutils.errors.DistutilsSetupError: use_2to3 is invalid.
use_2to3 was indeed removed from setuptools 58 (released on 2021-09-05)
Software environment:
Commands run:
$ python3 -m venv env-hachoir
$ cd env-hachoir
$ . bin/activate
$ pip install hachoir
Successfully installed hachoir-3.3.0
$ pip install hachoir[urwid]
Requirement already satisfied: hachoir[urwid] in ./lib/python3.10/site-packages (3.3.0)
Collecting urwid==1.3.1 (from hachoir[urwid])
Downloading urwid-1.3.1.tar.gz (588 kB)
Preparing metadata (setup.py) ... error
error: subprocess-exited-with-error
× python setup.py egg_info did not run successfully.
exit code: 1
Traceback (most recent call last):
(...)
File "/home/(...)/env-hachoir/lib/python3.10/site-packages/setuptools/dist.py", line 139, in invalid_unless_false
raise DistutilsSetupError(f"{attr} is invalid.")
distutils.errors.DistutilsSetupError: use_2to3 is invalid.
(...)
A workaround (or solution) is to require a more recent (or most recent) version of urwid.
hachoir-urwid works fine for me with the latest urwid (2.6.7).
Actually it would be helpful to parse DOCX/PPTX/XLSX formats and all other OpenXML formats too.
I'am using Yumo, Ruby-lib over Tika https://github.com/Erol/yomu but it's Ruby and Python for me is much more easier to understand.
A Builder API allowing a user to construct binary data according to a defined set of FieldSets.
Taking an example from the docs:
https://hachoir.readthedocs.io/en/latest/developer.html#parser-with-sub-field-sets
The user should be able to create the value of the data variable roughly as follows:
stream = BuilderByteStream()
creatable = MyFormat(stream)
creatable["signature"].value = b"MYF"
creatable["count"].value = len(pointlist)
for subfieldset, point in zip(creatable["point"], pointlist):
subfieldset["letter"].value = point["letter"]
subfieldset["code"].value = point["code"]
data = stream.to_bytes()
For some reason, hachoir-metadata has been producing bad duration output for ogv files. I've been testing with the "Computer Chronicles" collection on archive.org. To reproduce:
wget https://archive.org/download/CC517_commodore_64/CC517_commodore_64.ogv
from hachoir import parser as hachoir_parser
from hachoir import metadata as hachoir_metadata
video_file = 'CC517_commodore_64.ogv'
parser = hachoir_parser.createParser(video_file,video_file)
hachoir_metadata.extractMetadata(parser).exportPlaintext()
['Common:', '- Title: Commodore 64', '- Duration: 1 min 14 sec 423 ms', '- Location: http://www.archive.org/details/CC517_commodore_64', '- Copyright: http://creativecommons.org/licenses/by-nc-nd/2.0/', '- Producer: Xiph.Org libTheora I 20081020 3 2 1', '- MIME type: video/theora', '- Endianness: Little endian', 'Video:', '- Image width: 400 pixels', '- Image height: 300 pixels', '- Pixel format: 4:2:0', '- Compression: Theora', '- Frame rate: 30.0 fps', '- Comment: Quality: 0', '- Format version: Theora version 3.2 (revision 1)', 'Audio:', '- Channel: stereo', '- Sample rate: 44.1 kHz', '- Compression: Vorbis', '- Format version: Vorbis version 0']
harchoir-metadata reports that the duration is 1 minute and 14 seconds, but if you open it up in VLC and watch it, it's actually 28 minutes and 31 seconds.
Using Python 3.8.2 and hachoir 3.1.1.
https://github.com/vstinner/hachoir/blob/master/hachoir/parser/misc/pdf.py#L395-L396
=>
%(Trailer.MAGIC, self.absolute_address // 8))
Hum, previously I configured Travis CI to only send email notifications to me. It seems like @nneonneo broke a test but didn't get a notification.
@nneonneo: can you please try to run "tox" to run tests before pushing a change?
It seems like the regression has been introduced by the commit b0306e6 according to git bisect.
1.3.3 was the latest version that worked on py2, and I have been using it for years with thousands of users.
Did someone remove hachoir-core, hachoir-parser, hachoir-metadata packages on pypi for a reason?
3.x is py3 only obviously
Using hachoir commit #5b9e05a on Windows 10 x64.
Steps...
test.py
BACKGROUND
from the reputable sourcehttps://fanart.tv/series/331821/the-looming-tower/
https://fanart.tv/api/download.php?type=download&image=88183§ion=1
)fanart.jpg
(the jpg image crc is BA866C09
)Python -V
# output Python 3.7.4 (32 bit)python.exe test.py
infinite loop exhausting memory until Python crashes with MemoryError.
[warn] [/exif/content/ifd[0]] [Autofix] Fix parser error: stop parser, found unparsed segment: start 1408, length 8, found unparsed segment: start 1480, length 8, found unparsed segment: start 1552, length 8, found unparsed segment: start 1712, length 16
[warn] [/exif/content/ifd[1]] [Autofix] Fix parser error: stop parser, found unparsed segment: start 1408, length 8, found unparsed segment: start 1480, length 8, found unparsed segment: start 1552, length 8, found unparsed segment: start 1712, length 16
[warn] [/exif/content/ifd[2]] [Autofix] Fix parser error: stop parser, found unparsed segment: start 1408, length 8, found unparsed segment: start 1480, length 8, found unparsed segment: start 1552, length 8, found unparsed segment: start 1712, length 16
...
and so on...
...
[warn] [/exif/content/ifd[231]] [Autofix] Fix parser error: stop parser, found unparsed segment: start 1408, length 8, found unparsed segment: start 1480, length 8, found unparsed segment: start 1552, length 8, found unparsed segment: start 1712, length 16
...
and so on...
test.py
script located where cloned hachoir3 (or named whatever) folder is located.
import os
import sys
HACHOIR_CLONE_PATH = 'hachoir3'
sys.path.insert(1, os.path.abspath(os.path.join(os.path.dirname(__file__),
HACHOIR_CLONE_PATH)))
from hachoir import parser
from hachoir import metadata
path = os.path.abspath(os.path.join(os.path.dirname(__file__), 'fanart.jpg'))
try:
parser = parser.createParser(path)
metadata = metadata.extractMetadata(parser)
except Exception as e:
print('Unable to extract metadata %r' % e)
hachoir-metadata returns "Company" field named as "NumWords"
I have file "o_gosmonitor_.doc" as example it includes name of organization "ИВЦ Минприроды" (russian) and when I use hachoir-metadata against this file I see
But it affects any .doc file
`
λ hachoir-metadata o_gosmonitor_.doc
Metadata:
For the NTFS parser I recently learned about the "update sequence array", which is a set of binary patches applied to specific offsets in the stream. Essentially, suppose you have a USA value of "05 00, 6F 20". Then, later in the file, you might see
68 65 6C 6C 05 00 77 6F 72 6C 64
(with the 05 00 at a specific offset - 510 bytes from the start of a 512-byte sector). You are supposed to apply the patch here, to fix it up into
68 65 6C 6C 6F 20 77 6F 72 6C 64
The patches are always at predictable addresses, but in general they could land in the middle of any field set, causing massive breakage when the NTFS parser attempts to parse the USA substitute values instead of the intended bytes.
Is there a way I could temporarily patch the stream reader to return the correct bytes, or is there some other option for properly handling this kind of weakly context-dependent stream patching?
Calling guess.py:createParser() with an invalid file type leaves open FileInputSteam object
Code is:
stream = FileInputStream(filename, real_filename, tags=tags)
return guessParser(stream)
Probably should be something like:
stream = FileInputStream(filename, real_filename, tags=tags)
guess = guessParser(stream)
if not guess:
stream.close()
return guess
Currently hachoir-urwid
and maybe other utilities have unstated external dependencies. For hachoir-urwid
this is a problem because doing a plain pip install urwid
will install version 2.x which seems to be incompatible (see #34). If we pass something like
'extras_require': {
'urwid': [
'urwid==1.3.1'
]
}
to the setup
method in setup.py
then users could pip install hachoir hachoir[urwid]
and be assured they get a tried and tested version.
When running hachoir-urwid
with urwid 2.0.1, any warnings that typically appear at the bottom of the interface (below the tab toolbar) instead appear as blank lines. I have highlighted these lines in my terminal and tried to copy them but they do not appear to contain any text.
I am using the latest version of hachoir from master.
Running hachoir-urwid tests/files/mev.64bit.big.elf
:
With urwid 1.3.1:
With urwid 2.0.1:
There's several useful scripts in the root of the repository, like hachoir-urwid
. It would be cool if these were packaged in such a way so that they would be available when hachoir is installed (as described here). Ideally:
pip install hachoir3
hachoir-urwid
Hello,
Given there are only a few commits in order to fix, then please reconsider maintaining compatibility with PY2 as there are many installs with PY2 that would benefit from hachoir updates.
Thank you.
@nneonneo Sorry to bother you again, but for verdan32.exe, two of the extracted TTF files differ in 1 (Verdanai.TTF) or 2 bytes (Verdanab.TTF) at position 0x56 from those extracted by cabextract.
The other core font installers seems to extract fine, but I haven't verified all files.
Is there any example code for this python wrapper hachoir
Is it possible?
Python implementation for PDF exists, it's PyPDF2 https://github.com/mstamy2/PyPDF2 library.
Probably it could be implemented and hachoir-metadata could support PDF too?
Sometimes I need to process hundreds of thousands files inside multiple directories. It's not so simple right now and parameter like "--filename " could help.
Just like FIDO tool - https://github.com/openpreserve/fido with very similar purposes of file identification.
Error output:
$ python3 -m hachoir.subfile DSN-CTL-V23R01.exe
[+] Start search on 6475848 bytes (6.2 MB)
[+] File at 0 size=57344 (56.0 KB): Microsoft Windows Portable Executable: Intel 80386, Windows GUI
[!] Memory error!
[+] End of search -- offset=524288 (512.0 KB)
Total time: 676 ms -- global rate: 756.7 KB/sec
Can be reproduced with https://dsn-ctl.fr/DSN-CTL-V23R01.exe
$ wget https://dsn-ctl.fr/DSN-CTL-V23R01.exe
--2023-07-04 13:19:30-- https://dsn-ctl.fr/DSN-CTL-V23R01.exe
Resolving dsn-ctl.fr (dsn-ctl.fr)... 85.236.158.186
Connecting to dsn-ctl.fr (dsn-ctl.fr)|85.236.158.186|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 6475848 (6.2M) [application/x-msdownload]
Saving to: ‘DSN-CTL-V23R01.exe’
DSN-CTL-V23R01.exe 100%[==========================================>] 6.18M 3.71MB/s in 1.7s
2023-07-04 13:19:32 (3.71 MB/s) - ‘DSN-CTL-V23R01.exe’ saved [6475848/6475848]
$ shasum -a256 DSN-CTL-V23R01.exe
b4e855f92c4ae8cec77b9ccaf8b6e0cf53134eb47f5e668980e20afdc149d99f DSN-CTL-V23R01.exe
Host info:
$ uname -rvm
6.2.6-76060206-generic #202303130630~1685473338~22.04~995127e SMP PREEMPT_DYNAMIC Tue M x86_64
Python version:
$ python3 -V
Python 3.10.6
Hachoir version:
$ pip show hachoir
Name: hachoir
Version: 3.2.0
Summary: Package of Hachoir parsers used to open binary files
Home-page: http://hachoir.readthedocs.io/
Author: Hachoir team (see AUTHORS file)
Author-email:
License: GNU GPL v2
Location: /home/agrajag9/.local/lib/python3.10/site-packages
Requires:
Required-by:
When extracting simple info like width and height from an MP4File hachoir seems to first parse out the entire file before yielding the info.
Dimensions and probably duration too, should be easily accessible at the start of the file.
$ hachoir-grep foo tests/files/*.png
Traceback (most recent call last):
File "/home/jwilk/.local/bin/hachoir-grep", line 8, in <module>
sys.exit(main())
File "/home/jwilk/.local/lib/python3.8/site-packages/hachoir/grep.py", line 183, in main
values, pattern, filenames = parseOptions()
File "/home/jwilk/.local/lib/python3.8/site-packages/hachoir/grep.py", line 66, in parseOptions
pattern = str(arguments[0], "ascii")
TypeError: decoding str is not supported
(originally reported by Samuel Thibault in https://bugs.debian.org/969914)
Hello,
It seems that I cannot successfully detect embedded RAR files created using the RAR 5.0 archiver version. After a brief review of rar.py, I believe this is because of the slight difference in file format being used with the newer archiver.
Newer versions of WinRAR use a slightly different file magic for RAR files.
; RAR archive version 1.50 onwards
52 61 72 21 1A 07 00
; RAR archive version 5.0 onwards
52 61 72 21 1A 07 01 00
Quick testing...
; Successfully detected embedded RAR inside SFX file. Created with RAR archiver < 5.0
0002fff0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00030000 52 61 72 21 1a 07 00 cf 90 73 00 00 0d 00 00 00 |Rar!.....s......|
00030010 00 00 00 00 08 b9 7a 00 80 23 00 a8 00 00 00 38 |......z..#.....8|
; Unsuccessfully detected embedded RAR inside SFX file. Created with RAR archiver >= 5.0
00072bf0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00072c00 52 61 72 21 1a 07 01 00 2c 10 d9 3c 0b 01 05 07 |Rar!....,..<....|
00072c10 00 06 01 01 e4 ce 81 00 6a c4 39 9b 13 03 02 83 |........j.9.....|
To test independently, create a self extracting RAR file with an archiver version >= 5.0. The latest version of WinRAR (software version 5.50) uses this by default now.
1. WinRAR and command line RAR use RAR 5.0 archive format by default.
You can change it to RAR 4.x compatible format with "RAR4" option
in archiving dialog or -ma4 command line switch.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.