shibukawa / imagesize_py Goto Github PK

License: MIT License

Python 100.00%

imagesize_py's Introduction

imagesize

This module analyzes JPEG/JPEG 2000/PNG/GIF/TIFF/SVG/Netpbm/WebP image headers and returns image size or DIP.

import imagesize

width, height = imagesize.get("test.png")
print(width, height)

width, height = imagesize.getDPI("test.png")
print(width, height)

This module is a pure Python module. You can use file like object like file or something like io.BytesIO.

API

imagesize.get(filepath)

Returns image size (width, height).
imagesize.getDPI(filepath)

Returns image DPI (width, height).

Benchmark

It only parses headers, and ignores pixel data. So it is much faster than Pillow.

module	result
imagesize (pure Python)	1.077 seconds per 100 000 times
Pillow	10.569 seconds per 100 000 times

I tested on MacBookPro (2014/Core i7) with 125kB PNG files.

Development

Run test with the following command:

python -m unittest

License

MIT License

Thanks

I referred to the following code:

I use sample image from here:

https://www.nightprogrammer.org/development/multipage-tiff-example-download-test-image-file/

Thank you for feedback:

tk0miya (https://github.com/tk0miya)
shimizukawa (https://github.com/shimizukawa)
xantares (https://github.com/xantares)
Ivan Zakharyaschev (https://github.com/imz)
Jon Dufresne (https://github.com/jdufresne)
Geoff Lankow (https://github.com/darktrojan)
Hugo (https://github.com/hugovk)
Jack Cherng (https://github.com/jfcherng)
Tyler A. Young (https://github.com/s3cur3)
Mark Browning (https://github.com/mabrowning)
ossdev07 (https://github.com/ossdev07)
Nicholas-Schaub (https://github.com/Nicholas-Schaub)
Nuffknacker (https://github.com/Nuffknacker)
Hannes Römer (https://github.com/hroemer)
mikey (https://github.com/ffreemt)
Marco (https://github.com/marcoffee)
ExtReMLapin (https://github.com/ExtReMLapin)

imagesize_py's People

Contributors

Stargazers

Watchers

Forkers

mitya57 avylove bagana chengxxxxwang jakirkham hugovk drorhilman hanss314 capsher funagi ivanovart s3cur3 diez37 dimentr ossdev07 tkalus-forks nicholas-schaub warwickharvey nuffknacker hroemer pledgecamp abhisheknishantpuresoftware spanglelabs pletessier ffreemt marcoffee gipsyblues alvistack extremlapin nibleash sunminni yuhongjiu elebur ennamarie19 mayhemheroes nguyenlequocdat04 gremur arpitjain799 openwd sysfce2 wolfi-chainguard-demo

imagesize_py's Issues

MacPorts imagesize_py port

#51

MacPorts is not updated with 1.4.1 version...

py39-imagesize @1.3.0 (python, devel, graphics)

Description:          This module analyzes jpeg/jpeg2000/png/gif image headers and returns the image size.
Homepage:             https://github.com/shibukawa/imagesize_py

Build Dependencies:   py39-setuptools
Library Dependencies: python39
Test Dependencies:    py39-pytest
Platforms:            darwin
License:              MIT
Maintainers:          none

EXIF rotation tags

If an image contains an EXIF rotation flag the returned size has to be rotated accordingly, any plans to add this? 👀

Could you make git tags of releases that you upload to PyPI, both so that it's possible to easily find the exact commit that the release is of, and to have an alternative download location now that PyPI uses hashed URLs instead of making them predictable, which is really annoying when packaging software since now I need to go to PyPI and copy the URL instead of just changing the version number on new releases.

Width and height of image are transposed when EXIF contains rotation metadata

EXIF supports 'Orientation' parameter, which may instruct the image to be opened after rotating by 90 degrees. This is respected by all software I have tested, but unfortunately not by imagesize, causing incorrect results.

Sample image:

Same image with orientation EXIF set to rotate 90 degrees CW:

However, when I run the following code:

print(imagesize.get('example-exif.jpg')) # This prints (300, 100)
print(imagesize.get('example-exif-rotated.jpg')) # This prints (300, 100)

# Note that numpy image dimensions are (height, width, colors)
print(imageio.imread('example-exif-rotated.jpg').shape) # This prints (100, 300, 3)
print(imageio.imread('example-exif-rotated.jpg').shape) # This prints (300, 100, 3)

How to access version information?

A perhaps naive question, but since the standard attribute 'version' is not defined how can we access it?

Getting the number of channels?

Is it possible to add functionality to also parse out the number of channels in the image? I'd like to distinguish between grayscale, RGB, and RGBA images. It would be OK it it got confused by things like color pallets.

The reason is that I'd like to incorporate this into my kwimage.load_image_shape function as it is stupidly faster than PIL and GDAL:

        >>> # For large files, PIL is much faster GDAL
        >>> from osgeo import gdal
        >>> from PIL import Image
        >>> import timerit
        >>> #
        >>> import kwimage
        >>> fpath = kwimage.grab_test_image_fpath()
        >>> #
        >>> ti = timerit.Timerit(100, bestof=10, verbose=2)
        >>> for timer in ti.reset('gdal'):
        >>>     with timer:
        >>>         gdal_dset = gdal.Open(fpath, gdal.GA_ReadOnly)
        >>>         width = gdal_dset.RasterXSize
        >>>         height = gdal_dset.RasterYSize
        >>>         gdal_dset = None
        >>> #
        >>> for timer in ti.reset('PIL'):
        >>>     with timer:
        >>>         pil_img = Image.open(fpath)
        >>>         width, height = pil_img.size
        >>>         pil_img.close()
        >>> # The imagesize module is quite fast
        >>> import imagesize
        >>> for timer in ti.reset('imagesize'):
        >>>     with timer:
        >>>         width, height = imagesize.get(fpath)

Timed gdal for: 100 loops, best of 10
    time per loop: best=83.266 µs, mean=85.919 ± 2.1 µs
Timed PIL for: 100 loops, best of 10
    time per loop: best=38.191 µs, mean=38.981 ± 0.7 µs
Timed imagesize for: 100 loops, best of 10
    time per loop: best=8.269 µs, mean=8.516 ± 0.2 µs

But in those use-cases it's often important to know how many channels there will be as well. Is that possible to parse out of the headers?

Returns (-1,-1) instead of exception when used on something unsupported

When using imagesize.get() on anything which is not supported (text files, empty files, etc.), the method just returns (-1,-1). I would suggest raising a ValueError instead (as is already the case when trying a random XML file, since is is parsed as SVG).

On the other hand, that would be a behavior change, so maybe either document the current behavior or bump the major version if this change is implemented?

1.4.1: pep517 build fails

Source code from git tag

+ /usr/bin/python3 -sBm build -w --no-isolation
* Getting dependencies for wheel...
Traceback (most recent call last):
  File "/usr/lib/python3.8/site-packages/pep517/in_process/_in_process.py", line 363, in <module>
    main()
  File "/usr/lib/python3.8/site-packages/pep517/in_process/_in_process.py", line 345, in main
    json_out['return_val'] = hook(**hook_input['kwargs'])
  File "/usr/lib/python3.8/site-packages/pep517/in_process/_in_process.py", line 130, in get_requires_for_build_wheel
    return hook(config_settings)
  File "/usr/lib/python3.8/site-packages/setuptools/build_meta.py", line 177, in get_requires_for_build_wheel
    return self._get_build_requires(
  File "/usr/lib/python3.8/site-packages/setuptools/build_meta.py", line 159, in _get_build_requires
    self.run_setup()
  File "/usr/lib/python3.8/site-packages/setuptools/build_meta.py", line 281, in run_setup
    super(_BuildMetaLegacyBackend,
  File "/usr/lib/python3.8/site-packages/setuptools/build_meta.py", line 174, in run_setup
    exec(compile(code, __file__, 'exec'), locals())
  File "setup.py", line 4, in <module>
    from imagesize import __version__
ImportError: cannot import name '__version__' from 'imagesize' (unknown location)

Reading image size of remote file

Has anyone considered adding this functionality?
From my brief experimentation, it's a little more tricky than swapping

    # with open(str(filepath), 'rb') as fhandle:
    with urllib.request.urlopen(str(url)) as fhandle:

Though that works fine for png files, anything that needs to seek, like a jpeg, will fail.

TIFF support (particularly multipage)

Would be nice to have TIFF support. In particular, it would be nice if it could detect and properly handle multipage TIFFs.

`_convertToPx` discards fractional units

In _convertToPx, all `length values are casts to integers. This breaks SVG files that specify width and height as floats. For example, "25.4mm" becomes "25mm", meaning the length will be "94.488px" instead of "96px".

missing test.jp2 file

see test/test_get.py:

    def test_load_jpeg2000(self):
        width, height = imagesize.get(os.path.join(imagedir, "test.jp2"))
        self.assertEqual(width, 802)
        self.assertEqual(height, 670)

also, It could be great if you could setup travis integration to run these tests

cannot read a svg file

I try to get the size of a .svg file. This file is displayed just fine on my computer but raise an error in this lib.
do you have any idea why ?

To reproduce:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<svg viewBox="0 0 1 1" xmlns="http://www.w3.org/2000/svg">
    <style> * { fill: black } </style>
    <polygon points="0,1 1,1 0.5,0" class="triangle" />
</svg>

import imagesize

w, h = imagesize.get("triangle.svg")

The full error traceback

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
File ~/.pyenv/versions/3.8.3/lib/python3.8/site-packages/imagesize.py:210, in get(filepath)
    209 data = data.decode('utf-8')
--> 210 width = re.search(r'[^-]width="(.*?)"', data).group(1)
    211 height = re.search(r'[^-]height="(.*?)"', data).group(1)
AttributeError: 'NoneType' object has no attribute 'group'
During handling of the above exception, another exception occurred:
ValueError                                Traceback (most recent call last)

/Users/pierrickrambaud/Documents/travail/FAO/app_buffer/sphinx-favicon/toto.ipynb Cellule 3 in <cell line: 3>()

1 import imagesize

----> 3 w, h = imagesize.get("/Users/pierrickrambaud/Documents/travail/FAO/app_buffer/sphinx-favicon/tests/roots/test-static_files/gfx/nested/triangle.svg")
File ~/.pyenv/versions/3.8.3/lib/python3.8/site-packages/imagesize.py:213, in get(filepath)

211     height = re.search(r'[^-]height="(.*?)"', data).group(1)

212 except Exception:

--> 213     raise ValueError("Invalid SVG file")

214 width = _convertToPx(width)

215 height = _convertToPx(height)
ValueError: Invalid SVG file

1.4.1: pytest is failing in `test/test_get_filelike.py::test_get_filelike` unit

Looks like URL used in test suite fails in test/test_get_filelike.py::test_get_filelike unit

+ PYTHONPATH=/home/tkloczko/rpmbuild/BUILDROOT/python-imagesize-1.4.1-5.fc35.x86_64/usr/lib64/python3.8/site-packages:/home/tkloczko/rpmbuild/BUILDROOT/python-imagesize-1.4.1-5.fc35.x86_64/usr/lib/python3.8/site-packages
+ /usr/bin/pytest -ra -m 'not network'
============================= test session starts ==============================
platform linux -- Python 3.8.17, pytest-7.4.0, pluggy-1.2.0
rootdir: /home/tkloczko/rpmbuild/BUILD/imagesize_py-1.4.1
collected 45 items

test/test_get.py .......................                                 [ 51%]
test/test_get_filelike.py F                                              [ 53%]
test/test_getdpi.py .....................                                [100%]

=================================== FAILURES ===================================
______________________________ test_get_filelike _______________________________

    def test_get_filelike():
        """ test_get_filelike. """

        url = 'https://www.tsln.com/wp-content/uploads/2018/10/bears-tsln-101318-3-1240x826.jpg'
        try:
>           response = urlopen(url)

test/test_get_filelike.py:28:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
/usr/lib64/python3.8/urllib/request.py:222: in urlopen
    return opener.open(url, data, timeout)
/usr/lib64/python3.8/urllib/request.py:531: in open
    response = meth(req, response)
/usr/lib64/python3.8/urllib/request.py:640: in http_response
    response = self.parent.error(
/usr/lib64/python3.8/urllib/request.py:569: in error
    return self._call_chain(*args)
/usr/lib64/python3.8/urllib/request.py:502: in _call_chain
    result = func(*args)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <urllib.request.HTTPDefaultErrorHandler object at 0x7f3467737cd0>
req = <urllib.request.Request object at 0x7f34677378b0>
fp = <http.client.HTTPResponse object at 0x7f34677f60d0>, code = 404
msg = 'Not Found', hdrs = <http.client.HTTPMessage object at 0x7f34676b1760>

    def http_error_default(self, req, fp, code, msg, hdrs):
>       raise HTTPError(req.full_url, code, msg, hdrs, fp)
E       urllib.error.HTTPError: HTTP Error 404: Not Found

/usr/lib64/python3.8/urllib/request.py:649: HTTPError

During handling of the above exception, another exception occurred:

    def test_get_filelike():
        """ test_get_filelike. """

        url = 'https://www.tsln.com/wp-content/uploads/2018/10/bears-tsln-101318-3-1240x826.jpg'
        try:
            response = urlopen(url)
            raw = response.read()
        except Exception as exc:
>           raise SystemExit(exc)
E           SystemExit: HTTP Error 404: Not Found

test/test_get_filelike.py:31: SystemExit
=========================== short test summary info ============================
FAILED test/test_get_filelike.py::test_get_filelike - SystemExit: HTTP Error ...
========================= 1 failed, 44 passed in 1.40s =========================

Other thing is that all units which needs more than localhost would be good to marked by network pytest mark which is used widely many other modules test suites.
https://docs.pytest.org/en/7.1.x/example/markers.html
Many distributions build envs are intentionally cut off from access to public network and running in such conditions pytest -m "not network" would allows easy skip such units.

1.2.0: missing git tag

According to https://pypi.org/project/imagesize/#history latest version is 1.2.0 buut there is no in repo git tag for that version.

Incorrect conversion factor for `pt` to `px`

imagesize_py/imagesize.py

Line 74 in 5b8557e

return int(length) * 96 / 6

Assuming DPI=96, converting points to pixels should be 96 / 72. SVG files using pt are wrong by a factor of 12.

BufferedReader instances aren't used correctly

Hello! Thanks for the awesome package!

I tried doing something similar to:

with open(path, "rb") as f:
  imagesize.get(f)

But got an exception, since get expects the parameter to be either BytesIO or a PathLike/str, and it turns out open() returns a BufferedReader which is neither.

This is a simple fix since BufferedReader implements the same needed API as BytesIO.
I'd like to suggest a change: if the parameter is a str or PathLike, try to open it, in any other case try to use it as a buffer.

This would solve my use case and open more possibilities (i.e. a BufferedReader reading from HTTP could be used for e.g. #44 ).

Does this sound good? If so, I'd be happy to submit a PR.

Thanks!

webp support

Could you include webp support? Thank you!

Extended function to support Buffer and io.BufferedReader.

I maintained your function to process with Bytes and io.BufferedReader. If you want, you can get it.
Cause it helpful when you work with buffer.

**Note:
But this way, it doesn't support XML cause ElementTree works with file.

Thanks for your repo.

def image_size(src):
    """
    Implement: https://github.com/shibukawa/imagesize_py
    Return (width, height) for a given img file content
    no requirements
    :rtype Tuple[int, int]
    """
    assert isinstance(src, (bytes, io.BufferedReader, str))
    height = -1
    width = -1
    cursor = 0

    if type(src) is str:
        src = open(src, 'rb')

    if type(src) is io.BufferedReader:
        buffer = src.read()
        src.close()
    else:
        buffer = src

    head = buffer[:24]
    size = len(head)
    # handle GIFs
    if size >= 10 and head[:6] in (b'GIF87a', b'GIF89a'):
        # Check to see if content_type is correct
        try:
            width, height = struct.unpack("<hh", head[6:10])
        except struct.error:
            raise ValueError("Invalid GIF file")
    # see png edition spec bytes are below chunk length then and finally the
    elif size >= 24 and head.startswith(b'\211PNG\r\n\032\n') and head[12:16] == b'IHDR':
        try:
            width, height = struct.unpack(">LL", head[16:24])
        except struct.error:
            raise ValueError("Invalid PNG file")
    # Maybe this is for an older PNG version.
    elif size >= 16 and head.startswith(b'\211PNG\r\n\032\n'):
        # Check to see if we have the right content type
        try:
            width, height = struct.unpack(">LL", head[8:16])
        except struct.error:
            raise ValueError("Invalid PNG file")
    # handle JPEGs
    elif size >= 2 and head.startswith(b'\377\330'):
        try:
            size = 2
            ftype = 0
            while not 0xc0 <= ftype <= 0xcf or ftype in [0xc4, 0xc8, 0xcc]:
                cursor += size
                byte = buffer[cursor:cursor+1]
                cursor += 1
                while ord(byte) == 0xff:
                    byte = buffer[cursor:cursor+1]
                    cursor += 1
                ftype = ord(byte)
                size = struct.unpack('>H', buffer[cursor:cursor+2])[0] - 2
                cursor += 2
            # We are at a SOFn block
            cursor += 1  # Skip `precision' byte.
            height, width = struct.unpack('>HH', buffer[cursor:cursor+4])
            cursor += 4
        except struct.error:
            raise ValueError("Invalid JPEG file")
    # handle JPEG2000s
    elif size >= 12 and head.startswith(b'\x00\x00\x00\x0cjP  \r\n\x87\n'):
        cursor = 48
        try:
            height, width = struct.unpack('>LL', buffer[cursor:cursor+8])
        except struct.error:
            raise ValueError("Invalid JPEG2000 file")
    # handle big endian TIFF
    elif size >= 8 and head.startswith(b"\x4d\x4d\x00\x2a"):
        offset = struct.unpack('>L', head[4:8])[0]
        cursor = offset
        ifdsize = struct.unpack(">H", buffer[cursor:cursor+2])[0]
        cursor += 2
        for i in range(ifdsize):
            tag, datatype, count, data = struct.unpack(">HHLL", buffer[cursor:cursor+12])
            if tag == 256:
                if datatype == 3:
                    width = int(data / 65536)
                elif datatype == 4:
                    width = data
                else:
                    raise ValueError("Invalid TIFF file: width column data type should be SHORT/LONG.")
            elif tag == 257:
                if datatype == 3:
                    height = int(data / 65536)
                elif datatype == 4:
                    height = data
                else:
                    raise ValueError("Invalid TIFF file: height column data type should be SHORT/LONG.")
            if width != -1 and height != -1:
                break
        if width == -1 or height == -1:
            raise ValueError("Invalid TIFF file: width and/or height IDS entries are missing.")
    elif size >= 8 and head.startswith(b"\x49\x49\x2a\x00"):
        offset = struct.unpack('<L', head[4:8])[0]
        cursor = offset
        ifdsize = struct.unpack("<H", buffer[cursor:cursor+2])[0]
        cursor += 2
        for i in range(ifdsize):
            tag, datatype, count, data = struct.unpack("<HHLL", buffer[cursor:cursor+12])
            if tag == 256:
                width = data
            elif tag == 257:
                height = data
            if width != -1 and height != -1:
                break
        if width == -1 or height == -1:
            raise ValueError("Invalid TIFF file: width and/or height IDS entries are missing.")
    return width, height

SVG support

For our needs I have added a dirty SVG support:
https://github.com/GNS3/gns3-server/blob/4d8cf8460ef35829041432a49c48c8d173b1822a/gns3server/utils/picture.py

Actually it's not battle tested and use in an experimentation with no need for performance, but I hope it could be usefull for someone

Thanks for the project!

[1.4.0] Missing GIT tag from GH

https://pypi.org/project/imagesize/1.4.0/ is available for pip, but corresponding 1.4.0 GIT tag is now missing for 2742754

Create git tag for release 1.0.0

There’s imagesize 1.0.0 on PyPI, but no corresponding tag in this repository. Please add it.

[1.2.0] Git tag and some 1.2.0 changes missing in Git master?

Hi!

I noticed that the tag-based release listing of this repository does not show a release 1.2.0 while the listing on PyPI does. So I had a closer look and found that the latest release on PyPI has some tiny changes that I cannot find in Git history. Maybe that can be fixed? Am I missing something?

Thanks and best, Sebastian

diff -ur imagesize_py/README.rst imagesize-1.2.0/README.rst
--- imagesize_py/README.rst     2020-03-05 01:49:29.201029087 +0100
+++ imagesize-1.2.0/README.rst  2019-12-26 17:09:43.000000000 +0100
@@ -21,6 +21,12 @@
 * ``imagesize.get(filepath)``
 
   Returns image size (width, height).
+  ``get_from_bytes(bytes)`` is for bytes.
+
+* ``imagesize.getDPI(filepath)``
+
+  Returns DPI value.
+  ``getDPI_from_bytes(bytes)`` is for bytes.
 
 Benchmark
 ------------
@@ -83,4 +89,6 @@
 * Jon Dufresne (https://github.com/jdufresne)
 * Geoff Lankow (https://github.com/darktrojan)
 * Hugo (https://github.com/hugovk)
-
+* Jack Cherng (https://github.com/jfcherng)
+* Tyler A. Young (https://github.com/s3cur3)
+* Mark Browning (https://github.com/mabrowning)
diff -ur imagesize_py/setup.cfg imagesize-1.2.0/setup.cfg
--- imagesize_py/setup.cfg      2020-03-05 01:49:29.201029087 +0100
+++ imagesize-1.2.0/setup.cfg   2019-12-26 17:13:14.000000000 +0100
@@ -2,4 +2,9 @@
 universal = 1
 
 [metadata]
-license_file = LICENSE.rst
\ No newline at end of file
+license_file = LICENSE.rst
+
+[egg_info]
+tag_build = 
+tag_date = 0
+
diff -ur imagesize_py/setup.py imagesize-1.2.0/setup.py
--- imagesize_py/setup.py       2020-03-05 01:49:29.201029087 +0100
+++ imagesize-1.2.0/setup.py    2019-12-26 17:10:15.000000000 +0100
@@ -3,7 +3,7 @@
 from setuptools import setup
 
 setup(name='imagesize',
-      version='1.1.0',
+      version='1.2.0',
       description='Getting image size from png/jpeg/jpeg2000/gif file',
       long_description='''
 It parses image files' header and return image size.
@@ -13,6 +13,7 @@
 * JPEG2000
 * GIF
 * TIFF (experimental)
+* SVG
 
 This is a pure Python library.
 ''',
@@ -37,6 +38,7 @@
           'Programming Language :: Python :: 3.5',
           'Programming Language :: Python :: 3.6',
           'Programming Language :: Python :: 3.7',
+          'Programming Language :: Python :: 3.8',
           'Programming Language :: Python :: Implementation :: CPython',
           'Programming Language :: Python :: Implementation :: PyPy',
           'Topic :: Multimedia :: Graphics'

FAILED test/test_get_filelike.py::test_get_filelike - assert (-1, -1) == (1240, 826)

Facing below test failure.

E assert (-1, -1) == (1240, 826)
E At index 0 diff: -1 != 1240
E Use -v to get more diff

test/test_get_filelike.py:35: AssertionError
====================short test summary info ===============================================
FAILED test/test_get_filelike.py::test_get_filelike - assert (-1, -1) == (1240, 826)
=================1 failed, 44 passed in 1.75s =================================================

This test try to get image size and assert it with (1240, 826).
But the image link provided in code "https://github.com/shibukawa/imagesize_py/blob/master/test/test_get_filelike.py#L26" does not exist. Because of which image size returns (-1,-1) and assertion fails.

Include the test images in the PyPi release?

Hi,

Currently, the test files 'test.png' et cetera aren't included in the PyPi tarball.

Will you include the images used by the test suite in the PyPi release tarball?

Or, do you prefer that packagers use the tarballs from GitHub?