Code Monkey home page Code Monkey logo

Comments (6)

JoshData avatar JoshData commented on August 19, 2024

Probably a bug/limitation in the libmagic library. But if you drop the period from the extension argument (i.e. extension="xls") it should fall back to using that and hopefully will work.

from messytables.

domoritz avatar domoritz commented on August 19, 2024

Should be fixed in b63480e

If you don't provide an extension:

ValueError: Unrecognized MIME type: application/CDFV2-corrupt

If you provide .xls (which should really be xls):

ValueError: Could not determine MIME type and unrecognized extension: .xls

If you provide a correct extension xls:

<messytables.excel.XLSTableSet object at 0x10e97d250>

from messytables.

frabcus avatar frabcus commented on August 19, 2024

Refiled over at scraperwiki#1, as I want to have a closer look to make it simpler.

from messytables.

rossjones avatar rossjones commented on August 19, 2024

If you use any_tableset with a CDF file, no mimetype and no extension with the code in b63480e for me it returns application/CDFV2-corrupt, but if I use magic.fromfile(fileobj.name) I get application/vnd.ms-excel

Presumably there is data later in the file that definitively identifies it as application/vnd.ms-excel (as CDF could be any office file type).

from messytables.

scraperdragon avatar scraperdragon commented on August 19, 2024

Messytables only looks at the first 1k, I believe [citation needed] that magic checks the first 4k.

There's a fix for this in my pull request; not sure whether this fixes that explicit example.

from messytables.

scraperdragon avatar scraperdragon commented on August 19, 2024

Nope. It requires at least 64K (of a 95K file)

Tested with:
dd if=Food_price_indices_data_deflated.xls bs=1 count=65536 of=Food_64K.xls

and similar.
(There's a specific C module in the file libary for CDF files.)

from messytables.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.