Comments (7)
That might be true, but without metrics we don't know. Also, IMO the library is considerably fast enough for the 99% of the use cases. Why do you care of excellent performance here? What's currently impacting you?
from filetype.py.
I tried with a cluster of several thousand of files and performances wasn't so great, but, I admit, mine was a case very at the edge. :p
from filetype.py.
Interesting... my impression is that this is a CPython limitation, more than an implementation performance issue, but we can try improving things. If you can lead this by preparaing some performance test suites scenarios that I can easily reproduce, that would be great.
from filetype.py.
Hi, preparing a general performance test suite is a bit difficult here because of the nature of the phisycal medium on which the test will be performed. If we try to process in parallel so many files stored on a single HDD, then its I/O limit will be reached very quickly, but if all the files would be splitted in more SSDs, then the result should more less limited by drive performances.
from filetype.py.
I would suggest that a performance for this scenarios test should not involve any I/O at all. That would make the performance testing goal inaccurate, and therefore irrelevant.
Instead, the performance suite should only cover the boundaries of the actual code logic to measure. In this context that would imply passing a binary buffer representing the file signature, up to 256 bytes. That's all you need, no disk I/O impact here.
from filetype.py.
Ok, I'll try to preprare a draft of the new code and make a PR, ;)
from filetype.py.
Magic bytes don't work for complex container types like ISO-BMFF (MP4, MOV, HEIF/HEIC) and Matroska (MKV, WEBM). The headers need to be parsed to determine the format.
from filetype.py.
Related Issues (20)
- tests/fixtures/sample.zst is missing from the repository HOT 1
- "1.0.13" --> "1.1.0" regression. `filetype.guess` stopped working with output from `read(file_path, "rb")` HOT 1
- xlsx file guess error HOT 2
- Does not recognize mp3 type
- Support `io.BytesIO` as input HOT 7
- xls and xlsx guessed as zip HOT 5
- One type of mp4 is not supported (compatibility with libmagic) HOT 1
- audio/m4a files are matched as video/mp4
- The csv file is not recognized. Can it be supported HOT 1
- Whether the detection of the other audio formats can be supported?
- Animated AVIFs aren't recognized
- MS Word misidentified as MS Excel HOT 1
- Font mimetypes are outdated HOT 4
- Access to types module blocked by assignment
- Incorrect documentation for functions accepting "bytes"
- support python 3.10+
- Add support to txt file HOT 2
- Empty docx file can't seem to get file type
- Docx detected as Zip due to trash files HOT 1
- MS Word misidentified as MS Excel
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from filetype.py.