Code Monkey home page Code Monkey logo

Comments (5)

iandees avatar iandees commented on June 29, 2024

If we gzip the CSV before we put it on S3 and then tell S3 that it's
gzipped they will serve it with the appropriate headers that trigger a
browser to decompress it before showing/downloading it to the user.

from machine.

NelsonMinar avatar NelsonMinar commented on June 29, 2024

Unfortunately always serving gzip breaks HTTP; you're not supposed to send it gzip unless the client asks. It only works if the browser expects and supports Content-Encoding: gzip. curl and wget both explicitly don't deal with gzip by default.

I set up a demo of this at http://com.somebits.s3gzip.s3-website-us-west-2.amazonaws.com/gzipped.txt. It loads fine in Chrome. It fails with curl but works with curl --compressed. wget doesn't even have a way to decompress, you have to do wget | gunzip. That's why I think it's better to explicitly name the file .csv.gz; the user will know what to do with it.

Lots more detail about this weirdness in S3 on my blog.

from machine.

NelsonMinar avatar NelsonMinar commented on June 29, 2024

@migurski mentioned running out of disk in some of his EC2 runs. If we only produce compressed products then we can save a lot of disk while a job is running. A full run of 790 sources wrote 28G to disk. If I tar + gzip the whole directory that gets to 17G, 60%. Not an enormous savings but it can only help. (I suspect it's not more because a lot of the 28G is cached shapefile zips that are already compressed.)

from machine.

NelsonMinar avatar NelsonMinar commented on June 29, 2024

We had a discussion in Slack and @migurski, @iandees, and I all agree we should zip the CSV output and serve .zip files instead of .csv files. This lets us avoid the whole S3/gzip encoding question. Also allows us to put extra files into the product if we want, maybe a .vrt file or some other metadata.

from machine.

migurski avatar migurski commented on June 29, 2024

PR #84 does the thing—one of you want to have a look before I merge?

from machine.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.