Comments (8)
4.8.0 has been released, which includes the new functionality discussed here!
from swcompression.
Hi @philmitchell,
Thank you for PR and happy new year to you too!
I had a look and, wow, this autoreleasepool
thing does wonders: repeating the same tests as I did previously in this thread results in 0.19 GB "maximum resident set size" for when using TarReader
+ autoreleasepool
vs. 2.93 GB for TarContainer.open
.
As I understand now, the new TarReader/Writer
types won't help with memory consumption issues by themselves: users will still have to use autoreleasepool
to achieve the desirable effect. So I wonder, if it's possible to move the autoreleasepool
inside TarReader
, because otherwise these new types seem useless.
One way to achieve this would be to add a new function, something like TarReader.processNext<Result>(_ f: (TarEntry) -> Result) throws -> Result
, which obtains a new TarEntry
(similar tonext()
) and calls the user-supplied function on it inside the autoreleasepool
.
Anyway, this question is for me think about later in the week (alongside with merging your PR, etc.), so thank you once again!
from swcompression.
I have some news to share.
First of all, as I discovered today, autoreleasepool
is a Darwin-only thing, so it is not available on Linux and Windows. Apart from having to provide compatibility implementation for it on those platforms, I wonder, what is the situation with memory usage there. As in, if this new TarReader/Writer
stuff will be even helpful on Linux/Windows. I've also found on Swift forums some relevant discussion about the autoreleasepool
availability.
Secondly, I merged tar-rw
branch into develop
and made some additional changes:
TarReader.next()
has been renamed intoTarReader.read()
.TarReader.process(_:)
has been added (this is the function that automatically usesautoreleasepool
).- Removed
--use-rw-api
option fromswcomp tar
: it will now automatically useTarReader/Writer
, if possible (which means when there are no additional compression options supplied).
Note that there are no changes to TarWriter
. As much as I would like to make it use autoreleasepool
internally, so users wouldn't have to think about these subtleties, I don't think it is possible. So the only solution there would be to mention in documentation something along the lines of "you should wrap autoreleasepool
around the TarWriter
stuff".
from swcompression.
To answer your main question: no, there has been no progress on this issue.
However, I see now that there is a lot of demand for this kind of functionality. As such, I will try to do something about it, starting with your specific case (creating TAR). I will report back with some sort of implementation soon (hopefully, by the end of this week, but no guarantees).
from swcompression.
Thanks a lot for the quick response - I deeply appreciate it.
This functionality would help immensely.
For my specific case, I'm trying to create a tar file and write it to disk efficiently without holding so much data in memory. So a function that accepts tar entries (with a URL location of the tar/file entry rather than the Data of the file) and a URL to write the data in chunks to would be very helpful as well.
Thanks again!
from swcompression.
So I did implement something. I added a set of relatively simple APIs that use Foundation's FileHandle
:
public struct TarReader {
public init(fileHandle: FileHandle)
public mutating func next() throws -> TarEntry?
}
public struct TarWriter {
public init(fileHandle: FileHandle, format: TarContainer.Format = .pax)
public mutating func append(_ entry: TarEntry) throws
public func finalize() throws
}
These should help avoid loading everything into memory at once and only do it as necessary.
I've tried to check if they indeed help with memory usage. I used /usr/bin/time -lp
command to measure "maximum resident set size" and tarred "Microsoft Word.app" as a test file (2.26 GB). The results are a bit confusing. For the TarContainer.open
function I've got 3.265 GB vs 1.746 GB for the version with TarReader
. For the TarContainer.create
and its TarWriter
version I've got 3.515 GB vs. 2.279 GB, respectively. The full terminal output is available in the attached file (though it may be a bit cryptic). term_output.txt
So, what do you think about all of this? I should note that I have no idea whether or not this metric of "maximum resident set size" is a good one to evaluate memory usage. As I assume you did some memory-related measurements before, any advice on how to better estimate memory usage would be appreciated!
P.S. If you are interested in testing these new functions yourself, they are available in tar-rw
branch.
from swcompression.
I just submitted a PR to address this, but only for the case of untarring. If you want, I'll look at tarring, too ... but I happen to need only untarring :)
I used Xcode instruments to confirm that with the addition of autoreleasepool
, the new rw-api
branch can untar quite large files without any increase in transient or persistent memory usage.
Thanks Timofey ... and happy new year!
from swcompression.
Ha, that's super interesting! It was surprising to discover that we had to manually release memory during the file read, but now I'm really curious to know how it works in Swift on Linux/Windows. I'm swamped at the moment, but when I can will look into this.
Regarding removal of --use-rw-api: good call, I agree it should be the default. I actually ran some (very crude) tests to see if there was a performance hit when reading the tarfile in chunks and if anything it performs better. If that's correct, there's no reason to do it any other way.
from swcompression.
Related Issues (20)
- Progress when decompression is needed HOT 1
- [CRASH] MsbBitReader.bit() HOT 4
- AR Format Support HOT 7
- Can Gzip unarchive return Member? HOT 1
- GzipArchive.unarchive wrongMagic for short data HOT 1
- Brotli Support? HOT 2
- Symlinks in tarballs are created with absolute paths HOT 3
- BZip2Error.wrongCRC when decompressing HOT 2
- Support File I/O for large archives and resource-constrained environments
- Error when decompressing a certain 7z file HOT 2
- Some zip files can't be extracted HOT 2
- LZ4 format not compatible? HOT 8
- Compiling for watchOS fails HOT 1
- LZ4 decompress speed HOT 10
- Support splitting zips into multiple files
- Is there a way to extract a single file from a 7z compressed archive? HOT 4
- LZFSE Support
- Please update BitByteData dependency HOT 6
- Crash during GzipArchive.unarchive
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from swcompression.