Comments (6)
Ok. I am seeing the same behavior. I did not have a docx file type in my tests. One is added now. Looking into it now. Thanks and sorry for the delay.
from simplemagic.
Actually my local file commands still fail on this. Can you post your magic file somewhere? Maybe pastebin.com?
from simplemagic.
Here you go. This is the magic file that came with Cygwin for file
5.13. https://gist.github.com/zAlbee/8241169
I'm guessing this is the relevant part:
#------------------------------------------------------------------------------
# $File: msooxml,v 1.2 2013/01/25 23:04:37 christos Exp $
# msooxml: file(1) magic for Microsoft Office XML
# From: Ralf Brown <[email protected]>
# .docx, .pptx, and .xlsx are XML plus other files inside a ZIP
# archive. The first member file is normally "[Content_Types].xml".
# Since MSOOXML doesn't have anything like the uncompressed "mimetype"
# file of ePub or OpenDocument, we'll have to scan for a filename
# which can distinguish between the three types
# start by checking for ZIP local file header signature
0 string PK\003\004
# make sure the first file is correct
>0x1E string [Content_Types].xml
# skip to the second local file header
# since some documents include a 520-byte extra field following the file
# header, we need to scan for the next header
>>(18.l+49) search/2000 PK\003\004
# now skip to the *third* local file header; again, we need to scan due to a
# 520-byte extra field following the file header
>>>&26 search/1000 PK\003\004
# and check the subdirectory name to determine which type of OOXML
# file we have
# Correct the mimetype with the registered ones:
# http://technet.microsoft.com/en-us/library/cc179224.aspx
>>>>&26 string word/ Microsoft Word 2007+
!:mime application/vnd.openxmlformats-officedocument.wordprocessingml.document
>>>>&26 string ppt/ Microsoft PowerPoint 2007+
!:mime application/vnd.openxmlformats-officedocument.presentationml.presentation
>>>>&26 string xl/ Microsoft Excel 2007+
!:mime application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
>>>>&26 default x Microsoft OOXML
!:strength +10
from simplemagic.
Interesting. I don't support the search/... types but I guess I can add it. What I can do immediately is to add the [Content_Types].xml check and spit out Microsoft OOXML at least.
from simplemagic.
So version 1.5 has much better processing of the 2007+ versions of these files. Thanks again.
from simplemagic.
Thanks! I tested it out on .docx, .xlsx, and .pptx and they are working now. I forgot to mention that .xls and .ppt aren't recognized either (though .doc is). I can file a separate issue for those if you want.
from simplemagic.
Related Issues (20)
- XLS and CSV files not recognized HOT 3
- Problem with detecting Illustrator file HOT 4
- CMYK Jpeg files being incorrectly read with mime-type octet-stream
- RegexType reads a line for every byte in mutableOffset.offset HOT 1
- svg is not recognized HOT 1
- Doesn't recognize bitmap files exported from GIMP HOT 5
- Bash script not recognized correctly HOT 1
- findMatch() reads too much bytes HOT 4
- Possible bug inside of MagicEntryParser HOT 4
- Slim version without internal config files HOT 2
- Support for audio/amr HOT 6
- Questions about compatibility with Android HOT 3
- bug: Weird identification of heic image file HOT 1
- Upgrade built-in magic file and use external on Linux HOT 1
- How to use external magic db? HOT 1
- AWS cloudfront image return as content type html HOT 2
- ContentInfoUtil.findMatch([77, 90]) generates NPE HOT 1
- pcapng filetype support
- .xls file shows null in return of getMimeType
- Can't read the correct content type for .xlsx files (application/vnd.openxmlformats-officedocument.spreadsheetml.sheet)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from simplemagic.