Code Monkey home page Code Monkey logo

Comments (5)

bondeje avatar bondeje commented on June 20, 2024 1

Yes, my tests were on Windows.

Thanks for the detailed response. I appreciate all the information.

from python-zxing.

dlenski avatar dlenski commented on June 20, 2024

which force UTF-8 compatibility of the raw and parsed streams. This causes UnicodeDecodeError exceptions with any byte sequences that begin with bytes "larger" than b"\x7f" with which ZXing doesn't appear to have any problems. Just commenting out '.decode()' in each of these lines, I am able to reproduce ZXing results for byte data in both raw and parsed.

Can you give a specific example of a barcode where the ZXing Java library produces a non-UTF8-compatible raw and/or parsed output, and how you think that barcode should be interpreted?

My question: is there a reason the '.decode()' or UTF-8 is even necessary for either raw or parsed or is the use case for this wrapper intended to be for UTF-8 compatible QR codes only?

It would be nice to make it work for as many cases as possible. I'm a bit leery about changing the current/existing behavior (return a Python str) to returning bytes instead, however.

from python-zxing.

bondeje avatar bondeje commented on June 20, 2024

Below is the QR code for the binary packed integer 128: b'\x80'.
Int_128_packed_binary

For the current UTF-8 decoding, i.e.

raw = raw[:-1].decode()
parsed = parsed[:-1].decode()

it results in:
image

But removing the UTF-8 decoding, i.e.

raw = raw[:-1]#.decode()
parsed = parsed[:-1]#.decode() 

results in what I am expecting:
image

I emphasize I am expecting because I'm not sure if the "raw" here is intended to be just the "raw text" or the appropriate portion of "raw bytes" as seen in the https://zxing.org/w/decode output for the same QR code. It seems that the zxing CommandLineRunner is able to return the "raw bytes" when subprocess is using the None encoding as it currently is. I understand if this is not supported because trying to get all three of the "raw text", part of "raw bytes", and "parsed result" from the subprocess (with only options of all bytes or all string with specified encoding) is probably ugly usage of try/catch and attempting to guess whatever decoding/parsing is intended.

image

I am not doing anything special with reading the QR code:

import zxing

if __name__ == '__main__':
    img_file = './python-zxing_tests/Int_128_packed_binary.png' # the attached QR code
    print(zxing.BarCodeReader().decode(img_file))

from python-zxing.

dlenski avatar dlenski commented on June 20, 2024

Below is the QR code for the binary packed integer 128: b'\x80'. https://user-images.githubusercontent.com/88994019/130013442-60545f8d-432d-491d-b3e4-37891cde13cc.png

The QR code symbology does not provide any mechanism to encode "binary" barcodes. The only well-defined use of it is to encode strings in various character encodings.

The default interpretation of the bytes in a QR code (same for PDF417, Aztec code, DataMatrix) is ISO-8859-1; any barcode that intends another character encoding must declare it using ECI codes in order to be interpreted correctly.

(See https://stackoverflow.com/questions/27857718/aztec-barcode-vs-qr-code/64585749 for some work I've done in this area.)

So:

  1. There's no well-defined way to signal that a QR code should be interpreted as "binary". ISO-8859-1 sorta-kinda works as a binary fallback, because there are no multi-byte character sequences, so tolerant encoders/decoders bytes can reversibly decode to ISO-8859-1 and then re-encode including unknown bytes.
  2. The ZXing command-line-runner mangles the output of raw bytes beyond recognition on some operating systems, if they can't be correctly interpreted as UTF-8. For example, the QR code you give as an example gets completely borked on Linux. (I'm guessing you're testing on Windows?) See aff3dde where I added your file as an example, along with some other possible changes:
======================================================================
FAIL: test_all.test_decoding('QR_CODE-binary-80.png', 'QR_CODE', b'\x80')
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/home/dlenski/floss/python-zxing/test/test_all.py", line 50, in _check_decoding
    raise AssertionError('Expected {!r} but got {!r}'.format(expected_raw, dec.raw_bytes))
AssertionError: Expected b'\x80' but got b'\xef\xbf\xbd'
  1. In order to improve this situation, the ZXing command-line runner would have to be improved to not mangle unknown bytes.

My question: is there a reason the '.decode()' or UTF-8 is even necessary for either raw or parsed or is the use case for this wrapper intended to be for UTF-8 compatible QR codes only?

I guess the "tl;dr" here is that there's no way to use the ZXing command-line runner, in its current form, to decode barcodes whose contents don't map to a sequence of UTF-8 characters.

from python-zxing.

dlenski avatar dlenski commented on June 20, 2024

Yes, my tests were on Windows.

Good to know, thanks! Interesting that ZXing CLI mangles the output less on Windows. It'd be good to get a stable encoding-proof output method upstream.

I lack the bandwidth to work on this now, but would be happy to review an upstream PR if you want to contribute it. :-D

from python-zxing.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.