What version of CUE are you using ( cue version )?</h

We agree with <a class="user-mention notranslate" data-hovercard-type="user" data-hove

modules: consider switching to zstd for modules archives about cue HOT 5 CLOSED

myitcv commented on June 9, 2024 1

modules: consider switching to zstd for modules archives

from cue.

Comments (5)

rogpeppe commented on June 9, 2024

I'm not against this per se, but some nuance on a couple of points above:

our module cache will basically always extract module archives, not just for cue/load as it is now, but also for the LSP by design

This makes it sound a bit like module archives will always be fully extracted to disk for CUE evaluation, but I can certainly imagine significant situations where that might not be necessary: for example when evaluating CUE in a non-interactive situation, for example in a browser or for a one-shot server-side evaluation. The zip format arguably could provide some significant performance advantages there, as it could decompress only files involved in the required packages.

we could always consider a middle ground, like a zip archive with zstd compression, if we must retain addressing single files inside the zip

I'd be inclined to avoid that, for now at least, because of Russ's reservations about tooling support under Windows.

from cue.

mvdan commented on June 9, 2024

I can certainly imagine significant situations where that might not be necessary: for example when evaluating CUE in a non-interactive situation, for example in a browser or for a one-shot server-side evaluation.

How would you display errors? You would need paths/filenames in some form for the sake of debugging. I guess you could somehow treat the zip archive as a directory, but it would break anything that expects absolute paths to actually be files on disk, e.g. CI failure log viewers or terminals/editors where filenames are linkified.

The zip format arguably could provide some significant performance advantages there, as it could decompress only files involved in the required packages.

I think this argument goes both ways. tar.zst would compress far better than zip, meaning a performance and storage improvement off the bat for serving and fetching module archives - this is a win for everyone. As far as decompressing/extracting, it depends on how often we think the client will need to decompress entire module archives. My opinion is that cmd/cue will do that far more often than not, e.g. running cue export for the sake of errors pointing to real files on disk that the user can open.

I'd be inclined to avoid that, for now at least, because of Russ's reservations about tooling support under Windows.

I agree that zip files with zstd compression are likely not the best option - they do marginally improve compression, but as a middle ground solution, it makes noone happy :)

from cue.

rogpeppe commented on June 9, 2024

I can certainly imagine significant situations where that might not be necessary: for example when evaluating CUE in a non-interactive situation, for example in a browser or for a one-shot server-side evaluation.

How would you display errors? You would need paths/filenames in some form for the sake of debugging. I guess you could somehow treat the zip archive as a directory, but it would break anything that expects absolute paths to actually be files on disk, e.g. CI failure log viewers or terminals/editors where filenames are linkified.

This is a good question. In general even file names aren't sufficient, because they're relative to the local filesystem which varies from place to place. One possibility is to use some kind of URL notation (not impossible because it might well be possible to point directly to the registry from whence the source came), or use a custom notation that identifies the source module and version.

In general, I wouldn't want to make it infeasible to evaluate CUE in situations where there's no available filesystem, and conversely, I think that tying ourselves to file-based error messages is probably a bit too limiting (a temporary file name might mean nothing to a user where a more domain-focused name might be more informative).

The zip format arguably could provide some significant performance advantages there, as it could decompress only files involved in the required packages.

I think this argument goes both ways. tar.zst would compress far better than zip, meaning a performance and storage improvement off the bat for serving and fetching module archives - this is a win for everyone. As far as decompressing/extracting, it depends on how often we think the client will need to decompress entire module archives. My opinion is that cmd/cue will do that far more often than not, e.g. running cue export for the sake of errors pointing to real files on disk that the user can open.

Note that cue export does not need to decompress the entire archive: it could potentially just decompress the packages that are required. With large modules, that could potentially be a significant win.

Note that I'm not against using .tar.zstd in principle, but we should understand the trade-offs before making the leap.

from cue.

mvdan commented on June 9, 2024

In general, I wouldn't want to make it infeasible to evaluate CUE in situations where there's no available filesystem, and conversely, I think that tying ourselves to file-based error messages is probably a bit too limiting (a temporary file name might mean nothing to a user where a more domain-focused name might be more informative).

Fair enough. To be clear, our error messages are already filename-based today, so I'm talking in practical terms about what is already the status quo.

Note that cue export does not need to decompress the entire archive: it could potentially just decompress the packages that are required. With large modules, that could potentially be a significant win.

Fair enough again. I don't suspect that module archives will become large today, but it's hard to predict how large they might get in the future.

I think we're in general agreement that we're OK with keeping standard zips for our first artifact version application/vnd.cue.module.v1+json. Zips are compatible with io/fs, which we're aiming to move towards for APIs like cue/load, whereas compressed tar archives require extracting the entire (or most of?) the archive to locate a file or implement io/fs methods like ReadDir.

For some rough realistic numbers, I ran a quick test of zip vs tar.zst on our latest alpha source archive:

cue-v0.8.0-alpha.4 uncompresed weighs about 15MiB
cue-v0.8.0-alpha.4.zip sits at 3.4MiB
cue-v0.8.0-alpha.4.tar.zst sits at 2.3MiB with the default compression level (fast, 3), and 1.7MiB with a high level (19)

So it seems like standard zip can take up to twice as much space as a well compressed tar.zst. Network and disk space these days is relatively cheap, so I don't think halving the archive size warrants losing io/fs support.

I'm also warming up to the idea of zip with zstd compression rather than deflate may be the future, e.g. for a application/vnd.cue.module.v2+json in a few years. I found https://nickb.dev/blog/there-and-back-again-with-zstd-zips/ illuminating in this respect; it seems like you can still get decent size reductions by swapping the per-file compression algorithm, to the point that the difference in size between zip+zstd and tar.zst might be rather small in most cases. zstd compression is already part of the ZIP spec, so I suspect it will become rather common in a matter of a few years.

For all the reasons above, I'm happy to ship v0.8.0 as currently implemented, with standard deflate zips. We can consider a v2 with zip+zstd in a few years, for the sake of decent network and disk usage wins, without losing io/fs compatibility at all.

One point we raised with @rogpeppe and @myitcv was to redesign the current modzip package so that it doesn't hard-code assumptions about zip archives, but instead it could be generic to any archive format that is io/fs compatible and can compress file by file. I actually struggled to find another reasonably well established such format, as TAR is definitely not one. Per the zip+zstd blog post above, https://github.com/electron/asar does exist, but isn't nearly as well established, and doesn't really bring significant benefit. I'm not even sure that it's often used with per-file compression.

So I'm actually fine with leaving the modzip API as it is currently. I don't think we will move away from zip archives in the next decade, at least not while io/fs compatibility is the top priority, which it is.

from cue.

mvdan commented on June 9, 2024

We agree with @rogpeppe to close this as "won't fix" following the reasoning above, and @myitcv is happy with the outcome as well; closing for now. We can create a new issue or proposal in the future for zip+zstd or any other "v2" module archive format.

from cue.

modules: consider switching to zstd for modules archives about cue HOT 5 CLOSED

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent