tonyvalenti / mime-detective-clarkis117 Goto Github PK
View Code? Open in Web Editor NEWThis project forked from ofthelit/mime-detective
Mime type detector for files, byte arrays, and streams, .NET Standard Fork
License: MIT License
This project forked from ofthelit/mime-detective
Mime type detector for files, byte arrays, and streams, .NET Standard Fork
License: MIT License
Word / Excel 97-2003 files may be detected as MSDOC type and not WORD or EXCEL.
I faced the issue by creating a blank Excel file in Excel 2007 and save it as XLS (check blank.zip)
From what I could check, Office 97-2003 file signatures are based on "subheaders" and there might have several of them without a clear documentation. However the library would detect it as MSDOC type.
I would therefore suggest to
Something like
// OLECF - Object Linking and Embedding (OLE) Compound File (CF)
// Compound Binary File format by Microsoft, used by Microsoft Office 97-2003 applications(Word, Powerpoint, Excel, Wizard)
public readonly static FileType MS_OFFICE = new FileType(new byte?[] { 0xD0, 0xCF, 0x11, 0xE0, 0xA1, 0xB1, 0x1A, 0xE1 }, "doc,ppt,xls", "application/octet-stream");
Since the type appears after WORD and EXCEL types, the detection would first match based on subheaders and default to this one if the subheader does not match.
Hi @clarkis117 -
I have recently done a complete rewrite of Mime Detective and added the ability to detect over 14,000 different file types.
I'm interested in publishing my update and, with your permission, taking over maintenance of the nuget package. Can you contact me to discuss?
We found out today that files that were analyzed (using stream extension - GetFileTypeAsync) were corrupted as we had not reset the stream position before saving it as a file.
I would therefore suggest to reset the stream position to 0 in case the stream is not disposed, or at least making it explicit in comments/wiki/docs
mimeTypes.GetFileType(() => fileData, stream, shouldDisposeStream: false)
When loading a old Microsoft Office file into a stream or byte array, the GetFileType() resulting MimeType is doc,ppt,xls, not the expect xls or doc.
For example. Both text fail.
[Fact]
public void CanReadExcelFileFromByteArray()
{
var result = File.ReadAllBytes("./data/Documents/XlsExcel2007.xls").GetFileType();
Assert.NotNull(result);
Assert.Equal(MimeTypes.EXCEL, result);
}
[Fact]
public void CanReadExcelFileFromStream()
{
using (FileStream stream = File.Open("./data/Documents/XlsExcel2007.xls", FileMode.Open))
{
var result = stream.GetFileType(false, true);
Assert.NotNull(result);
Assert.Equal(MimeTypes.EXCEL, result);
}
}
Any ideas how to get this to work without having to save the files to disk and then loading them using the FileInfo object?
I have used your tool very successfully, however, my code returns a null in the following when an mp4 is tested:
(The filePath extension is mp4)
NSUrl videoFileURL = NSUrl.FromString(filePath);
Uri uri = new Uri(videoFileURL.ToString());
StreamReader streamReader = new StreamReader(filePath);
FileInfo fileInfo = new FileInfo(filePath);
Stream stream = streamReader.BaseStream;
FileType fileType = MimeDetective.FileInfoExtensions.GetFileType(fileInfo);
(The fileType here is null)
This code works fine for .mov files (quicktime) but fails with mp4. Please let me know if I am using it incorrectly for these type of files or if there is actually an issue.
Thanks!
Hi,
I have tried to detect a 'pdf' file but I got 'txt' as a response
IFormFileCollection invoices
foreach (var invoice in invoices)
{
using (var ms = invoice.OpenReadStream())
{
FileType fileType = ms.GetFileType();
}
}
The file bytes starting with 25 50 44 46 and it should be acceptable as pdf
The pdf file bytes are starting with these:
25 50 44 46 2d 31 2e 37 0a 25 ef bf bd ef bf bd ef bf bd ef bf bd 0a 31 20 30 20 6f 62 6a 0a 3c 3c 2f 54 79 70 65 2f 43 61 74 61 6c 6f 67 2f 50 61 67 65 73 20 32 20 30 20 52 2f 4c 61 6e 67 28 74 72 2d 54 52 29 20 2f 53 74 72 75 63 74 54 72 65 65 52 6f 6f 74 20 31 30 20 30 20 52 2f 4d 61 72 6b 49 6e 66 6f 3c 3c 2f 4d 61 72 6b 65 64 20 74 72 75 65 3e 3e 2f 4d 65 74 61 64 61 74 61 20 32 31 20 30 20 52 2f 56 69 65 77 65 72 50 72 65 66 65 72 65 6e 63 65 73 20 32 32 20 30 20 52 3e 3e 0a 65 6e 64 6f 62 6a 0a 32 20 30 20 6f 62 6a 0a 3c 3c 2f 54 79 70 65 2f 50 61 67 65 73 2f 43 6f 75 6e 74 20 31 2f 4b 69 64 73 5b 20 33 20 30 20 52 5d 20 3e
.Net Core Project
net451 support would be nice to have, because many companies are still limited to this framework.
Calling GetFileType(this byte[] bytes)
returns "plain/text" instead of "application/xml".
There should be some sort of unit tests with local files included in the unit tests project
Empty ZIP files are a special kind of ZIP. The header differs.
File contents: 50 4b 05 06 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Calling GetFileType(this byte[] bytes)
returns null.
$ file -i ZIPempty.zip
ZIPempty.zip: application/zip; charset=binary
Methods which return FileType from (MimeTypes.cs
) usually have this comment:
/// <summary>
/// Read header of a file and depending on the information in the header
/// return object FileType.
/// *Return null in case when the file type is not identified.*
/// Throws Application exception if the file can not be read or does not exist
/// </summary>
But because of that line https://github.com/clarkis117/Mime-Detective/blob/master/src/Mime-Detective/MimeTypes.cs#L314 this statement about return value isn't true.
Input stream will be disposed after calling GetFileType().
Regards!
I faced the same issue #12 as reported in the original repository.
The proposed change was to run the file signature detection first, and then plain text detection.
That would definitely fix the issue and be more reliable.
Would you consider making this change?
If so, it might then be possible to improve the plain text detection to detect the file encoding.
When detecting a Excel File from a byte array throws a NullReferenceException at MimeDetective.MimeTypes.FindZipType(ReadResult& readResult) in \src\Mime-Detective\MimeTypes.cs:line 314
var fileData = File.ReadAllBytes(filePath);
var type = fileData.GetFileType();
Hi @clarkis117,
I've updated the nuget to version 0.0.6-beta1 and now I'm getting the following error "End of Central Directory record could not be found" with a byte array of a Excel file.
Sample of code: File.ReadAllBytes(filePath).GetFileType()
Sample of file: test.xlsx
TargetFramework: .NET Core 2.0
Could you help me with this?
Additionally, do you have any expected date for the final nuget release of the 0.0.6?
Thank you,
Your current targets are:
netstandard1.3;net45
Would it be possible to add also net471, netstandard2.0 ?
I tried including your latest beta nuget into my net471 project and got some unexpected warnings.
Warning Found conflicts between different versions of "System.Net.Http" that could not be resolved. These reference conflicts are listed in the build log when log verbosity is set to detailed.
Dependencies System.Buffers and System.Xml.XmlSerializer both got included but they shouldn't?
It appears thatpublic static FileType LearnMimeType(FileInfo first, FileInfo second, string mimeType, int maxHeaderSize = 12, int minMatches = 2, int maxNonMatch = 3)
in MimeDetective.cs is checking every other byte for matches in this method and adds the matches to the header for the new file type it returns.
Is this intentional?
Additionally i'm having a hard time figuring out the purpose of this method at all. Is it used to find the common header info of a known filetype/mime extension, or is it used to find the mime type of an unknown file, by comparing it to a known mime type? or something else
The latest version really looks nice. I know that this is a personal project but do you have any idea of when you will reach 0.0.6 final?
What remain to be done? And if you need some help, maybe add issues for the community to help you out.
Keep on the good work!
Hi @clarkis117,
Sorry for the spam, but I'm having another issue with txt files in the version 0.0.6 beta 2.
Exception: System.NullReferenceException: 'Object reference not set to an instance of an object.'
MimeDetective.ByteArrayExtensions.GetFileType(...) returned null.
Use case:
File.ReadAllBytes(filePath).GetFileType()
Sample of file: test.txt
TargetFramework: .NET Core 2.0
Thank you,
open issue for adding TAP based Async Support
How can I add a custom file type?
I tried the following but without succes.
static MyClass()
{
MimeAnalyzers.PrimaryAnalyzer.Insert(EPS);
}
static void Check()
{
var type = uploadedFile.InputStream.GetFileType(); // returns null but should return the EPS file.
}
// https://www.garykessler.net/library/file_sigs.html
private static readonly FileType EPS = new FileType(
new byte?[] { 0x25, 0x21, 0x50, 0x53, 0x2D, 0x41, 0x64, 0x6F,
0x62, 0x65, 0x2D, 0x33, 0x2E, 0x30, 0x20, 0x45,
0x50, 0x53, 0x46, 0x2D, 0x33, 0x20, 0x30 },
"eps",
"application/postscript");
I checked the file with an hex viewer and the magic bytes are correctly configured above.
Anything I'm missing?
Hi,
Is csv supported? I've tried to detect a csv, but the file is detected as wav.
Thanks,
The MIME type detection crashes (throws an ArgumentException) on files smaller then 560 bytes, e. g. tiny plain text files.
Hi,
Is csv supported? I've tried to detect a csv, but the file is detected as wav.
Thanks,
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.