Code Monkey home page Code Monkey logo

Comments (19)

maluke avatar maluke commented on August 25, 2024

Non-UTF-8 requests are supposed to be transcoded first with req = req.decode() so that webob can assume utf-8 encoding throughout. Can you confirm if this is an area that req.decode misses?

from webob.

proppy avatar proppy commented on August 25, 2024

My understanding is that each field of a multipart body can have its own charset supplied along content-type and req.decode() seems to assume that the whole request is encoded with the same character encoding.

The tentative fix uses cgi.FieldStorage.type_options['charset'] to decode each field of the multipart data independently, should that be moved somehow into request.decode implementation?

from webob.

maluke avatar maluke commented on August 25, 2024

I think doing it in decode would make things simpler -- all of the
non-utf-8 code is isolated there.

On 8 August 2012 23:13, proppy [email protected] wrote:

My understanding is that each field of a multipart body can have its own
charset supplied along content-type and req.decode() seems to assume that
the whole request is encoded with the same character encoding.

The tentative fix uses cgi.FieldStorage.type_options['charset'] to decode
each fields of the multipart data independently, should that be moved
somehow into request.decode implementation?


Reply to this email directly or view it on GitHubhttps://github.com//issues/64#issuecomment-7595889.

from webob.

proppy avatar proppy commented on August 25, 2024

So the charset argument to decode would specify the default charset, and each field of a multipart body would honor their specific charset if present?

from webob.

maluke avatar maluke commented on August 25, 2024

Exactly. I don't think I've ever seen a multipart body from a browser with
parts that have encodings specified, I wonder if you are getting this from
some user-agent or is it a synthetic test case?

On 9 August 2012 00:58, proppy [email protected] wrote:

So the charset argument to decode would specify the default charset, and
each field of a multipart body would honor their specific charset if
present?


Reply to this email directly or view it on GitHubhttps://github.com//issues/64#issuecomment-7598970.

from webob.

proppy avatar proppy commented on August 25, 2024

We are getting this on App Engine Python upload handler when input[type=text] with non utf-8 charset data are submitted along with file upload.

See: http://code.google.com/p/googleappengine/issues/detail?id=2749

from webob.

maluke avatar maluke commented on August 25, 2024

So, the content-type: ...; charset on a section of a multipart
body differs from the same header on the whole request? An actual full
request (whatever went over the wire) would be great to inspect.

On 9 August 2012 01:06, proppy [email protected] wrote:

We are getting this on App Engine Python upload handler when
input[type=text] with non utf-8 charset data are submitted along with file
upload.


Reply to this email directly or view it on GitHubhttps://github.com//issues/64#issuecomment-7599157.

from webob.

proppy avatar proppy commented on August 25, 2024

Here is the content of os.environ[wsgi.input] in a manual test case reproducing the failure:

--000e0ce0b196b4ee6804c6c8af94
Content-Type: text/plain; charset=ISO-2022-JP
Content-Disposition: form-data; name=title
Content-Transfer-Encoding: 7bit

 $B$3$s$K$A$O (B
--000e0ce0b196b4ee6804c6c8af94
Content-Type: text/plain; charset=ISO-8859-1
Content-Disposition: form-data; name=submit

Submit
--000e0ce0b196b4ee6804c6c8af94
Content-Type: message/external-body; charset=ISO-8859-1; blob-key=AMIfv94TgpPBtKTL3a0U9Qh1QCX7OWSsmdkIoD2ws45kP9zQAGTOfGNz4U18j7CVXzODk85WtiL5gZUFklTGY3y4G0Jz3KTPtJBOFDvQHQew7YUymRIpgUXgENS_fSEmInAIQdpSc2E78MRBVEZY392uhph3r-In96t8Z58WIRc-Yikx1bnarWo
Content-Disposition: form-data; name=file; filename="photo.jpg"

Content-Type: image/jpeg
Content-Length: 38491
X-AppEngine-Upload-Creation: 2012-08-08 15:32:29.035959
Content-MD5: ZjRmNGRhYmNhZTkyNzcyOWQ5ZGUwNDgzOWFkNDAxN2Y=
Content-Disposition: form-data; name=file; filename="photo.jpg"


--000e0ce0b196b4ee6804c6c8af94--

from webob.

maluke avatar maluke commented on August 25, 2024

Can you provide pprint.pformat(os.environ) as well?

On 9 August 2012 01:35, proppy [email protected] wrote:

Here is the content of os.environ[wsgi.input] in my manual test case
reproducing the failure:

--000e0ce0b196b4ee6804c6c8af94
Content-Type: text/plain; charset=ISO-2022-JP
Content-Disposition: form-data; name=title
Content-Transfer-Encoding: 7bit

$B$3$s$K$A$O (B
--000e0ce0b196b4ee6804c6c8af94
Content-Type: text/plain; charset=ISO-8859-1
Content-Disposition: form-data; name=submit

Submit
--000e0ce0b196b4ee6804c6c8af94
Content-Type: message/external-body; charset=ISO-8859-1;
blob-key=AMIfv94TgpPBtKTL3a0U9Qh1QCX7OWSsmdkIoD2ws45kP9zQAGTOfGNz4U18j7CVXzODk85WtiL5gZUFklTGY3y4G0Jz3KTPtJBOFDvQHQew7YUymRIpgUXgENS_fSEmInAIQdpSc2E78MRBVEZY392uhph3r-In96t8Z58WIRc-Yikx1bnarWo
Content-Disposition: form-data; name=file; filename="photo.jpg"

Content-Type: image/jpeg
Content-Length: 38491
X-AppEngine-Upload-Creation: 2012-08-08 15:32:29.035959
Content-MD5: ZjRmNGRhYmNhZTkyNzcyOWQ5ZGUwNDgzOWFkNDAxN2Y=
Content-Disposition: form-data; name=file; filename="photo.jpg"

--000e0ce0b196b4ee6804c6c8af94--


Reply to this email directly or view it on GitHubhttps://github.com//issues/64#issuecomment-7599837.

from webob.

proppy avatar proppy commented on August 25, 2024

I stripped all the envars that don't start with HTTP_ for convenience:

{'HTTP_ACCEPT': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
 'HTTP_ACCEPT_CHARSET': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3',
 'HTTP_ACCEPT_LANGUAGE': 'en-US,en;q=0.8,ja;q=0.6',
 'HTTP_CACHE_CONTROL': 'max-age=0',
 'HTTP_CONTENT_TYPE': 'multipart/form-data; boundary=20cf3054a8cd0693cc04c6c90173',
 'HTTP_HOST': '3113448.proppy-bugs.appspot.com',
 'HTTP_ORIGIN': 'http://3113448.proppy-bugs.appspot.com',
 'HTTP_USER_AGENT': 'Mozilla/5.0 (X11; CrOS x86_64 2694.0.0) AppleWebKit/537.3 (KHTML, like Gecko) Chrome/22.0.1222.0 Safari/537.3'}

from webob.

maluke avatar maluke commented on August 25, 2024

Seems like the fix in your pull request is the correct one, requiring req.decode(..) would be silly in this case. The only problem is the unicode literal that will not work on py3, but I'll fix it myself.

Thank you for the bug report and the patch.

from webob.

proppy avatar proppy commented on August 25, 2024

Oh, I was working on adding a test to test_request.py :) I guess I could drop it now.

Thanks a lot for merging.

from webob.

proppy avatar proppy commented on August 25, 2024

Added the extra test just in case you want to merge it too.

from webob.

maluke avatar maluke commented on August 25, 2024

Here's the correction that was necessary on py3: 27de7f9

Another test case is always helpful, can you please merge the correction above into the new test as well?

from webob.

maluke avatar maluke commented on August 25, 2024

Basically:

-        self.assertEqual(req.POST['title'].encode('utf-8'),
-                         u'こんにちは'.encode('utf-8'))
+        self.assertEqual(req.POST['title'], text_('こんにちは', 'utf8'))

from webob.

proppy avatar proppy commented on August 25, 2024

FYI, it seems that nose fails to represent correctly the failure when comparing non-ascii string (that's why I was encoding things to utf-8 before comparing them in my previous patch):

  File "/home/proppy/webob/nose-1.1.2-py2.7.egg/nose/plugins/failuredetail.py", line 43, in formatFailure
    return (ec, '\n'.join([str(ev), tbinfo]), tb)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 74-78: ordinal not in range(128)

from webob.

maluke avatar maluke commented on August 25, 2024

Well, in that case, try self.assertEqual(req.POST['title'].encode('utf8'), text_('こんにちは', 'utf8').encode('utf8'))

The text_('こんにちは', 'utf8') thingy necessary for things to work on py3 as well.

from webob.

proppy avatar proppy commented on August 25, 2024

Done.

from webob.

maluke avatar maluke commented on August 25, 2024

Merged

from webob.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.