Code Monkey home page Code Monkey logo

Comments (7)

ahupp avatar ahupp commented on May 22, 2024

What happens when you do open(.. that filename...), does it produce the
same error or throw?

On Tue, Jan 15, 2013 at 6:15 AM, Marian Steinbach
[email protected]:

I have a file named:

'aktuelle_Dokumente.jsp?docTyp=ST&wp=15&dokNum=8.+Schulrechts\xe4\xae\xa4erungsgesetz&searchDru=suchen'

When I try to read this with magic's from_file method, I get the following
exception:

Traceback (most recent call last):
File "repo-audit.py", line 133, in
auditor.run()
File "repo-audit.py", line 27, in run
e = AuditEntry(fullpath, self.logfile, self.mime_magic)
File "repo-audit.py", line 61, in init
self.mimetype = self.file_type()
File "repo-audit.py", line 119, in file_type
return self.mime_magic.from_file(self.path)
File "/.../venv/lib/python2.7/site-packages/magic.py", line 70, in from_file
return magic_file(self.cookie, filename)
File "/.../venv/lib/python2.7/site-packages/magic.py", line 170, in magic_file
return _magic_file(cookie, coerce_filename(filename))
File "/.../venv/lib/python2.7/site-packages/magic.py", line 146, in coerce_filename
return filename.encode(sys.getfilesystemencoding())
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe4 in position 130: ordinal not in range(128)

I am on MacOS X 10.8.2 with python 2.7.2.


Reply to this email directly or view it on GitHubhttps://github.com//issues/27.

Adam Hupp | http://hupp.org/adam/

from python-magic.

ahupp avatar ahupp commented on May 22, 2024

Have a chance to look at this?

from python-magic.

 avatar commented on May 22, 2024

I can confirm this behaviour. Seems like coerce_filename has problems with UTF-8 encoded file names. Can be reproduced like this:

import magic
path = "/tmp/test\xfc.txt" # == /tmp/test/ü.txt ...german u Umlaut
with open(path, "w") as f:
    f.write('\n') # works
magic.coerce_filename(path) # fails with:

Traceback (most recent call last):
File "<input>", line 1, in <module>
File "/usr/lib64/python2.7/site-packages/python_magic-0.4.6-py2.7.egg/magic.py
", line 183, in coerce_filename
return filename.encode(sys.getfilesystemencoding())
UnicodeDecodeError: 'ascii' codec can't decode byte 0xfc in position 9: ordinal
not in range(128)

I'm running Python 2.7.5 with python_magic-0.4.6

from python-magic.

ahupp avatar ahupp commented on May 22, 2024

I think the problem is that getfilesystemencoding() only returns UTF-8 if your LANG is set appropriately. Otherwise it's ascii or similar. I can repro the error that way in the unit test.

Does this patch work for you?

diff --git a/magic.py b/magic.py
index cd5ff24..10685ac 100644
--- a/magic.py
+++ b/magic.py
@@ -193,14 +193,15 @@ def coerce_filename(filename):
return None

 # ctypes will implicitly convert unicode strings to bytes with
  • .encode('ascii'). A more useful default here is

  • getfilesystemencoding(). We need to leave byte-str unchanged.

  • .encode('ascii'). If you use the filesystem encoding

  • then you'll get inconsistent behavior (crashes) depending on the user's

  • LANG environment variable

is_unicode = (sys.version_info[0] <= 2 and
isinstance(filename, unicode)) or
(sys.version_info[0] >= 3 and
isinstance(filename, str))
if is_unicode:

  •    return filename.encode(sys.getfilesystemencoding())
    
  •    return filename.encode('utf-8')
    
    else:
    return filename

from python-magic.

ahupp avatar ahupp commented on May 22, 2024

Oops, try this: https://gist.github.com/ahupp/daff4e7e2e4ebafbe14c

from python-magic.

 avatar commented on May 22, 2024

Yes, the patch works.
Thanks a lot!

from python-magic.

ahupp avatar ahupp commented on May 22, 2024

Fixed in 012f8a9

from python-magic.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.