Comments (13)
Based on the error message it looks like a memory allocation error. Does this happen consistently for one file, or only after running for a while?
from python-magic.
The error occurs when i iterate over files in a directory for a few seconds (approx. 10-15 seconds) and appears to happen at the same file. I've also run my script over the same directory using two different computers, one with 16GB or RAM and the other with 32GB of RAM.
If it's a RAM issue is there a way to clear the RAM of already scanned files so all the data is not stored in the RAM?
from python-magic.
Are you creating a new instance of magic.Magic() for each file, or creating one and re-using it?
from python-magic.
I'm using in a for loop, so the way I have it setup is that I created one instance of it and then re-using it by feeding a file path via a variable.
from python-magic.
Can you share the file that seems to trigger this? Which version of libmagic are you using?
from python-magic.
Here is the snippet of code that is causing the issue. I'm using version: 0.4.14; I'm using Python 3.9.
def file_exists():
for root, dirs, files in os.walk(ROOT):
for fpath in [osp.join(root, f) for f in files]:
size = osp.getsize(fpath)
sha_256 = filehash_sha_256(fpath)
md5 = filehash_md5(fpath)
CRDate = osp.getctime(fpath)
C_Date = datetime.fromtimestamp(CRDate).strftime('%m-%d-%Y')
C_Time = datetime.fromtimestamp(CRDate).strftime('%H:%M:%S.%f')
MDate = osp.getmtime(fpath)
M_Date = datetime.fromtimestamp(MDate).strftime('%m-%d-%Y')
M_Time = datetime.fromtimestamp(MDate).strftime('%H:%M:%S.%f')
path = osp.realpath(fpath)
name = osp.basename(fpath)
# mime = magic.from_buffer(open(fpath, "rb").read(2048))
mime = magic.from_file(fpath)
mime_guess_type = mimetypes.guess_type(fpath, strict=True)
with open(file, "a", newline="") as header_file:
header = ["File_Name", "File Creation Date", "File Creation Time", "File Modified Date","File Modified Time", "Byte size", "Path", "MIME", "MIME_Guess", "SHA_256", "MD5"]
writer = csv.DictWriter(header_file, fieldnames=header)
if not file_exists:
writer.writeheader()
writer.writerow(
{
"Byte size": size,
"MIME": mime,
"MIME_Guess": mime_guess_type,
"SHA_256": sha_256,
"MD5": md5,
"File Creation Date": C_Date,
"File Creation Time": C_Time,
"File Modified Date": M_Date,
"File Modified Time": M_Time,
"Path": path,
"File_Name": name,
}
)
print(fpath)
from python-magic.
appears to happen at the same file.
Are you able to share the input file that triggers it? What version of libmagic are you using?
from python-magic.
It appears that I did not have python lib-magic installed . . . I tried to install it but the install is a bit problematic. Is the a trick to it?
from python-magic.
If you were running into this error it looks like you do have libmagic installed, that's what produces the error.
from python-magic.
I am also facing the same issue and my implementation is similar to that of @ccmn98. @ccmn98 did you find the resolution for this issue?
from python-magic.
Also this issue comes up only when running the script via PowerShell/Cmd. I ran the same code in my WSL and it seems to work completely fine and does not throw the error.
from python-magic.
Same error as #276 which was merged into #293.
These input files trigger the issue:
- https://github.com/ahupp/python-magic/files/9231524/memblock.txt (problematic file attached there)
- https://github.com/ggerganov/whisper.cpp/blob/3998465/bindings/java/src/test/java/io/github/ggerganov/whispercpp/WhisperCppTest.java (the one I ran across)
- https://github.com/twbs/bootstrap/blob/v5.2.2/js/src/util/config.js (added later)
Repro in Windows 10 Pro Sandbox:
- run
powershell -executionpolicy remotesigned
- use scoop to install python #v3.11.5
iex "& {$(irm get.scoop.sh)} -RunAsAdmin"; scoop install --no-update-scoop git python
- use pip to install dependencies
pip install python-magic
#v0.4.27pip install python-magic-bin
#v0.4.14
- get the files
git clone --depth 1 https://github.com/ggerganov/whisper.cpp.git
curl.exe -LO https://github.com/ahupp/python-magic/files/9231524/memblock.txt
- run
python
to repro the issueimport magic
magic.from_file("whisper.cpp/bindings/java/src/test/java/io/github/ggerganov/whispercpp/WhisperCppTest.java", mime=True)
magic.from_file("memblock.txt", mime=True)
error details
PS C:\windows\System32> pip install python-magic
Collecting python-magic
Using cached python_magic-0.4.27-py2.py3-none-any.whl (13 kB)
Installing collected packages: python-magic
Successfully installed python-magic-0.4.27
PS C:\windows\System32> pip install python-magic-bin
Collecting python-magic-bin
Using cached python_magic_bin-0.4.14-py2.py3-none-win_amd64.whl (409 kB)
Installing collected packages: python-magic-bin
Successfully installed python-magic-bin-0.4.14
PS C:\windows\System32> python
Python 3.11.5 (tags/v3.11.5:cce6ba9, Aug 24 2023, 14:38:34) [MSC v.1936 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import magic
>>> magic.from_file("whisper.cpp/bindings/java/src/test/java/io/github/ggerganov/whispercpp/WhisperCppTest.java", mime=True)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\WDAGUtilityAccount\scoop\apps\python\current\Lib\site-packages\magic\magic.py", line 135, in from_file
return m.from_file(filename)
^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\WDAGUtilityAccount\scoop\apps\python\current\Lib\site-packages\magic\magic.py", line 91, in from_file
return self._handle509Bug(e)
^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\WDAGUtilityAccount\scoop\apps\python\current\Lib\site-packages\magic\magic.py", line 100, in _handle509Bug
raise e
File "C:\Users\WDAGUtilityAccount\scoop\apps\python\current\Lib\site-packages\magic\magic.py", line 89, in from_file
return maybe_decode(magic_file(self.cookie, filename))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\WDAGUtilityAccount\scoop\apps\python\current\Lib\site-packages\magic\magic.py", line 255, in magic_file
return _magic_file(cookie, coerce_filename(filename))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\WDAGUtilityAccount\scoop\apps\python\current\Lib\site-packages\magic\magic.py", line 196, in errorcheck_null
raise MagicException(err)
magic.magic.MagicException: b"line I64u: regex error 14 for `^[[:space:]]*class[[:space:]]+[[:digit:][:alpha:]:_]+[[:space:]]*\\{(.*[\n]*)*\\}(;)?$', (failed to get memory)"
>>>
See also:
- https://github.com/trailofbits/polyfile - a pure Python re-implementation of libmagic with a truckload of dependencies, which seems to also fail to process this input.
- microsoft/vcpkg#11832 - vcpkg may be able to build libmagic for windows
from python-magic.
Thanks for the repo; this is definitely due to the older version of libmagic shipped with python-magic-bin. Just another case where the binaries situation causes trouble.
from python-magic.
Related Issues (20)
- 0.4.26 sdist contains accidental/broken `pyproject.toml`? HOT 1
- image/svg+xml not correctly guessed from buffer HOT 1
- Magic can't differentiate between c++ header and source files HOT 1
- application/octet-stream with text files on windows HOT 1
- MagicException: regex error HOT 1
- Error: The specified module could not be found HOT 1
- ImportError: failed to find libmagic. Check your installation HOT 9
- Package missing from the AUR HOT 2
- Upcoming test suite breakage to to changes in file HOT 3
- 0.4.27: pytest is failing HOT 1
- UnicodeDecodeError when filename includes non ASCII characters HOT 1
- Segmentation fault when attempting to load `msys-magic-1.dll` from Git SCM HOT 2
- magic.from_file() fails for files with German umlauts in their name although Windows 10 permits such filenames HOT 1
- Binary distribution for libmagic on Windows HOT 2
- Adding libmagic to python-magic wheel on PyPI HOT 4
- Please make `from_file` work on directories HOT 5
- Add a way to specify a default for `magic_file`. HOT 1
- Magic can't get a proper mime type from a MP3 file HOT 2
- On AlmaLinux 8, corrupt .gz files no longer raise an exception HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from python-magic.