Code Monkey home page Code Monkey logo

Comments (7)

h2non avatar h2non commented on May 22, 2024

That might be true, but without metrics we don't know. Also, IMO the library is considerably fast enough for the 99% of the use cases. Why do you care of excellent performance here? What's currently impacting you?

from filetype.py.

vuolter avatar vuolter commented on May 22, 2024

I tried with a cluster of several thousand of files and performances wasn't so great, but, I admit, mine was a case very at the edge. :p

from filetype.py.

h2non avatar h2non commented on May 22, 2024

Interesting... my impression is that this is a CPython limitation, more than an implementation performance issue, but we can try improving things. If you can lead this by preparaing some performance test suites scenarios that I can easily reproduce, that would be great.

from filetype.py.

vuolter avatar vuolter commented on May 22, 2024

Hi, preparing a general performance test suite is a bit difficult here because of the nature of the phisycal medium on which the test will be performed. If we try to process in parallel so many files stored on a single HDD, then its I/O limit will be reached very quickly, but if all the files would be splitted in more SSDs, then the result should more less limited by drive performances.

from filetype.py.

h2non avatar h2non commented on May 22, 2024

I would suggest that a performance for this scenarios test should not involve any I/O at all. That would make the performance testing goal inaccurate, and therefore irrelevant.

Instead, the performance suite should only cover the boundaries of the actual code logic to measure. In this context that would imply passing a binary buffer representing the file signature, up to 256 bytes. That's all you need, no disk I/O impact here.

from filetype.py.

vuolter avatar vuolter commented on May 22, 2024

Ok, I'll try to preprare a draft of the new code and make a PR, ;)

from filetype.py.

 avatar commented on May 22, 2024

Magic bytes don't work for complex container types like ISO-BMFF (MP4, MOV, HEIF/HEIC) and Matroska (MKV, WEBM). The headers need to be parsed to determine the format.

from filetype.py.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.