Code Monkey home page Code Monkey logo

tar-stream's Introduction

tar-stream

tar-stream is a streaming tar parser and generator and nothing else. It operates purely using streams which means you can easily extract/parse tarballs without ever hitting the file system.

Note that you still need to gunzip your data if you have a .tar.gz. We recommend using gunzip-maybe in conjunction with this.

npm install tar-stream

build status License

Usage

tar-stream exposes two streams, pack which creates tarballs and extract which extracts tarballs. To modify an existing tarball use both.

It implementes USTAR with additional support for pax extended headers. It should be compatible with all popular tar distributions out there (gnutar, bsdtar etc)

Related

If you want to pack/unpack directories on the file system check out tar-fs which provides file system bindings to this module.

Packing

To create a pack stream use tar.pack() and call pack.entry(header, [callback]) to add tar entries.

const tar = require('tar-stream')
const pack = tar.pack() // pack is a stream

// add a file called my-test.txt with the content "Hello World!"
pack.entry({ name: 'my-test.txt' }, 'Hello World!')

// add a file called my-stream-test.txt from a stream
const entry = pack.entry({ name: 'my-stream-test.txt', size: 11 }, function(err) {
  // the stream was added
  // no more entries
  pack.finalize()
})

entry.write('hello')
entry.write(' ')
entry.write('world')
entry.end()

// pipe the pack stream somewhere
pack.pipe(process.stdout)

Extracting

To extract a stream use tar.extract() and listen for extract.on('entry', (header, stream, next) )

const extract = tar.extract()

extract.on('entry', function (header, stream, next) {
  // header is the tar header
  // stream is the content body (might be an empty stream)
  // call next when you are done with this entry

  stream.on('end', function () {
    next() // ready for next entry
  })

  stream.resume() // just auto drain the stream
})

extract.on('finish', function () {
  // all entries read
})

pack.pipe(extract)

The tar archive is streamed sequentially, meaning you must drain each entry's stream as you get them or else the main extract stream will receive backpressure and stop reading.

Extracting as an async iterator

The extraction stream in addition to being a writable stream is also an async iterator

const extract = tar.extract()

someStream.pipe(extract)

for await (const entry of extract) {
  entry.header // the tar header
  entry.resume() // the entry is the stream also
}

Headers

The header object using in entry should contain the following properties. Most of these values can be found by stat'ing a file.

{
  name: 'path/to/this/entry.txt',
  size: 1314,        // entry size. defaults to 0
  mode: 0o644,       // entry mode. defaults to to 0o755 for dirs and 0o644 otherwise
  mtime: new Date(), // last modified date for entry. defaults to now.
  type: 'file',      // type of entry. defaults to file. can be:
                     // file | link | symlink | directory | block-device
                     // character-device | fifo | contiguous-file
  linkname: 'path',  // linked file name
  uid: 0,            // uid of entry owner. defaults to 0
  gid: 0,            // gid of entry owner. defaults to 0
  uname: 'maf',      // uname of entry owner. defaults to null
  gname: 'staff',    // gname of entry owner. defaults to null
  devmajor: 0,       // device major version. defaults to 0
  devminor: 0        // device minor version. defaults to 0
}

Modifying existing tarballs

Using tar-stream it is easy to rewrite paths / change modes etc in an existing tarball.

const extract = tar.extract()
const pack = tar.pack()
const path = require('path')

extract.on('entry', function (header, stream, callback) {
  // let's prefix all names with 'tmp'
  header.name = path.join('tmp', header.name)
  // write the new entry to the pack stream
  stream.pipe(pack.entry(header, callback))
})

extract.on('finish', function () {
  // all entries done - lets finalize it
  pack.finalize()
})

// pipe the old tarball to the extractor
oldTarballStream.pipe(extract)

// pipe the new tarball the another stream
pack.pipe(newTarballStream)

Saving tarball to fs

const fs = require('fs')
const tar = require('tar-stream')

const pack = tar.pack() // pack is a stream
const path = 'YourTarBall.tar'
const yourTarball = fs.createWriteStream(path)

// add a file called YourFile.txt with the content "Hello World!"
pack.entry({ name: 'YourFile.txt' }, 'Hello World!', function (err) {
  if (err) throw err
  pack.finalize()
})

// pipe the pack stream to your file
pack.pipe(yourTarball)

yourTarball.on('close', function () {
  console.log(path + ' has been written')
  fs.stat(path, function(err, stats) {
    if (err) throw err
    console.log(stats)
    console.log('Got file info successfully!')
  })
})

Performance

See tar-fs for a performance comparison with node-tar

License

MIT

tar-stream's People

Contributors

131 avatar adamcohenrose avatar andrewdeandrade avatar andrewrk avatar bmeck avatar bobzoller avatar brotchie avatar ctalkington avatar diamondap avatar dominictarr avatar eliliam avatar frederikbolding avatar hugomrdias avatar hvrauhal avatar ilbonte avatar jake-low avatar justfalter avatar kanongil avatar kasperisager avatar linusu avatar mafintosh avatar max-mapper avatar nlf avatar nono avatar piranna avatar sheerun avatar shinnn avatar sth avatar thenickdude avatar thlorenz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

tar-stream's Issues

extraction documentation error?

In the docs on extraction, the line pack.pipe(extract); is at the end of the example, but pack isn't defined and it doesn't make sense to me. Is that supposed to be there?

Broken in New Version of Safari

On iOS 10 (Mobile Safari 10.0) and Desktop Safari 9.1.2, I get:

Error: Invalid tar header. Maybe the tar is corrupted or it needs to be gunzipped?

This: decodeOct(buf, 148) is returning NaN.

Fails to pack an archive with non-Latin characters in file names

I've written simple code, it just copies a .tar to another one logging files names:

const tar = require('tar-stream');
const pack = tar.pack();
const extract = tar.extract();
const path = require('path');
const fs = require('fs');

extract.on('entry', (header, stream, next) => {
    console.log(header.name);
    stream.pipe(pack.entry(header, next));
});

extract.on('finish', () => {
    // all entries done - lets finalize it
    pack.finalize();
});

const tarPath = './example.tar';
const tarPathParsed = path.parse(tarPath);
const outputPath = `${tarPathParsed.dir}/${tarPathParsed.name}.new${tarPathParsed.ext}`;

let oldTarballStream = fs.createReadStream(tarPath);
let newTarballStream = fs.createWriteStream(outputPath);

// pipe the old tarball to the extractor
oldTarballStream.pipe(extract);

newTarballStream.on('close', () => {
    console.log(`${outputPath} has been written`);
});

// pipe the new tarball the another stream
pack.pipe(newTarballStream);

Also I've created an example.tar with a single file named ะขะตัั‚ะพะฒั‹ะน ั„ะฐะนะป.txt (Cyrillic characters in the file name). When I ran my code above, I've got example.new.tar with 2 files, both are named Pax Header. One of them contains:

38 path=ะขะตัั‚ะพะฒั‹ะน ั„ะฐะนะป.txt

Another Pax Header contains the full content of ะขะตัั‚ะพะฒั‹ะน ั„ะฐะนะป.txt.

Moreover, once I re-ran the code applying it to the example.new.tar (with those 2 PaxHeader's) I've got a tarball with, also, 2 PaxHeader's, but one of them was:

38 path=ะขะตัั‚ะพะฒั‹ะน ั„ะฐะนะป.txt
38 path=ะขะตัั‚ะพะฒั‹ะน ั„ะฐะนะป.txt

Another, again, was exact my original ะขะตัั‚ะพะฒั‹ะน ั„ะฐะนะป.txt.

I believe it's a bug of pack().

Unexpected end of data should raise error when extracting

When the input data ends while tar-stream waits for more data to extract, it doesn't raise an error. This means that a truncated tar file will extract without reporting errors, while creating incomplete files.

If there is still missing data for a file or a partially read header, an error should be raised instead.

This issue is likely also the cause for #71. If only a short file is processed (shorter than a tar header), no errors are raised since the partial header data is never processed.

8GB Limit?

Can you confirm (or not) that there is an 8GB limit (per file) for TAR creation with the implementation of TAR that you use (ustar is it?)

If so, is there any way round this that you know of? Or another library i could use?

Many thanks

Symlinks size is not consistent

There's different behaviour on the entry size field of symlinks when the location is set as stream or in the header, having sometimes a symlink with zero size and other times a symlink to nowhere but with a size the length of the destination path. It's needed to know what says the spec and make it work homoneneously.

Error: Invalid tar header. Maybe the tar is corrupted or it needs to be gunzipped?

I have a tarball from the registry (and others) that consistency fail with invalid tar header errors but are valid archive.

To replicate

  1. wget https://registry.npmjs.org/eslint-config-metashop/-/eslint-config-metashop-1.5.0.tgz

  2. Use the following test case

const Tar = require('tar-stream');
const Gunzip = require('gunzip-maybe');
const Fs = require('fs');

const gunzip = Gunzip();
const extract = Tar.extract();
const inputFile = Fs.createReadStream(process.argv[2]);

extract.on('error', function (err) {
  console.log(err);
});

extract.on('entry', function (header, stream, callback) {
  stream.on('end', function () {
    return callback()
  });
  stream.resume()
});

inputFile.on('error', function (err) {
  console.log(err)
});

inputFile.pipe(gunzip).pipe(extract);
  1. node test.js eslint-config-metashop-1.5.0.tgz

The error I get is something like

Error: Invalid tar header. Maybe the tar is corrupted or it needs to be gunzipped?
    at Object.exports.decode (/Users/adam_baldwin/Documents/projects/tarHash/node_modules/tar-stream/headers.js:265:40)
    at Extract.onheader [as _onparse] (/Users/adam_baldwin/Documents/projects/tarHash/node_modules/tar-stream/extract.js:124:39)
    at Extract._write (/Users/adam_baldwin/Documents/projects/tarHash/node_modules/tar-stream/extract.js:248:8)
    at Extract._continue (/Users/adam_baldwin/Documents/projects/tarHash/node_modules/tar-stream/extract.js:212:28)
    at oncontinue (/Users/adam_baldwin/Documents/projects/tarHash/node_modules/tar-stream/extract.js:65:10)
    at Extract.onheader [as _onparse] (/Users/adam_baldwin/Documents/projects/tarHash/node_modules/tar-stream/extract.js:132:7)
    at Extract._write (/Users/adam_baldwin/Documents/projects/tarHash/node_modules/tar-stream/extract.js:248:8)
    at Extract._continue (/Users/adam_baldwin/Documents/projects/tarHash/node_modules/tar-stream/extract.js:212:28)
    at oncontinue (/Users/adam_baldwin/Documents/projects/tarHash/node_modules/tar-stream/extract.js:65:10)
    at Extract.onheader [as _onparse] (/Users/adam_baldwin/Documents/projects/tarHash/node_modules/tar-stream/extract.js:132:7)

Doing some debugging it appears that the stream has advanced too far into the archive and the data it's trying to pass to parse the header is actually file contents, that's as far as we've got so far.

tar-stream does not handle directory entries well when modifying existing tarballs

Running a slightly modified example "Modifying existing tarballs" code on the compressed tar file http://registry.npmjs.org/which/-/which-1.0.5.tgz which other tar tools say are valid

Generates

processing entry package/bin/

_stream_readable.js:476
  dest.on('unpipe', onunpipe);
       ^
TypeError: Cannot call method 'on' of undefined
    at PassThrough.Readable.pipe (_stream_readable.js:476:8)
    at null.<anonymous> (bug.js:27:9)
    at EventEmitter.emit (events.js:106:17)
    at onheader (node_modules\tar-stream\extract.js:101:9)
    at Extract._write (node_modules\tar-stream\extract.js:172:7)
    at doWrite (_stream_writable.js:226:10)
    at writeOrBuffer (_stream_writable.js:216:5)
    at Writable.write (_stream_writable.js:183:11)
    at write (_stream_readable.js:583:24)
    at flow (_stream_readable.js:592:7)

So this is either a documentation / example improvement to outline that pack.entry returns returns a stream or nothing depending on the header.type field
or
A bug that extract.on('entry')
does seem to return a valid (but empty) stream to read from but pack.entry does not return a Sink point to read this empty stream because it is a directory

Modified example code below

// Location of problem tar file http://registry.npmjs.org/which/-/which-1.0.5.tgz


var tarStream = require('tar-stream');
var zlib = require('zlib');
var fs = require('fs');

var input = "which-1.0.5.tgz"; 
var output = "rewritten_which-1.0.5.tgz";

var inputTarfile = fs.createReadStream(input); 
var outputTarfile = fs.createWriteStream(output);
var gunzip = zlib.createGunzip();
var gzip = zlib.createGzip();

// Stream copy the tar.gz
var tarExtract = tarStream.extract();
var tarPack = tarStream.pack();

tarExtract.on('entry', function(header, stream, callback) {
    console.log('processing entry ' + header.name);
    // write the unmodified entry to the pack stream
    stream.pipe(tarPack.entry(header, callback));
});

tarExtract.on('finish', function() {
    // all entries copied, add new entry
    tarPack.finalize();
    });

//read input
inputTarfile.pipe(gunzip).pipe(tarExtract);

// write output
tarPack.pipe(gzip).pipe(outputTarfile);

Possible a better example for the documentation if this is the approach to fixing this bug is taken

var extract = tar.extract();
var pack = tar.pack();
var path = require('path');

extract.on('entry', function(header, stream, callback) {
    // let's prefix all names with 'tmp'
    header.name = path.join('tmp', header.name);
    var entrySink = pack.entry(header, callback);
    // If no entrySink was returned then the entry was not a 'file' or 'contigious-file'
    // therefore there is nothing to pipe data in to
    if ( typeof entrySink != "undefined") {
        // write the new entry to the pack stream
        stream.pipe(entrySink);
    }
});

extract.on('finish', function() {
    // all entries done - lets finalize it
    pack.finalize();
});

// pipe the old tarball to the extractor
oldTarball.pipe(extract);

// pipe the new tarball the another stream
pack.pipe(newTarball);

pack.entry with streams

hi,

the pack.entry function only accepts buffers, but no streams. Im not really sure if this is even implementable. Any ideas about this?

Make tar-stream a TransformStream

I mentioned this previously, but I'd like to be able to use tar-stream in this manner (or similar):

intar.pipe(tarStream(onentry, onfinish)).pipe(transformedTar)

I made a feeble attempt to get this to work outside of tar stream in dockerify lazy-stream branch, but had only mixed success.

The tests that all passed previously now only pass in node 0.10.
I hope though this can server as a start to figure out how the above could be achieved.

No error on invalid input

The following code tries to pipe some nonsense into an extractor:

var tar = require('tar-stream');
var stream = require('stream');
var extract = tar.extract();

extract.on('error', function(e) {
	console.log(e);
});

extract.on('entry', function(header, stream, next) {
	console.log(header);
});

extract.on('finish', function() {
	console.log('finish');
});

var input = new stream.PassThrough();

input.pipe(extract);

input.end(new Buffer('some random content'));

The only output is finish, but it should really emit some sort of error

"Invalid tar header" error on Docker

As shown on this pull-request I'm converting a cpio file generated with the get_init_cpio tool of the Linux kernel to a tar file. The generated tar file works correctly with vagga, but it crash on Docker with a "Invalid tar header" error, and the same file makes file-roller (Ubuntu/Gnome compressed files manager) to core dump.

Inspecting the content of the generated file directly with the tar command I get the next output:

[piranna@Mabuk:~/Proyectos/NodeOS]
 (vagga) > tar -tvf node_modules/nodeos-barebones/out/latest
tar: Sustituyendo `.' por un nombre miembro vacรญo
d--x--x--x 0/0               0 2015-10-28 12:05 
-r-xr-xr-x 0/0          651800 2015-10-28 12:05 lib/libc.so
lr-xr-xr-x 0/0               8 2015-10-28 12:05 lib/ld-musl-x86_64.so.1 -> libc.so
tar: Saltando a la siguiente cabecera
-r--r--r-- 0/0         1250352 2015-10-28 12:05 lib/libstdc++.so.6.0.17
lr--r--r-- 0/0              20 2015-10-28 12:05 lib/libstdc++.so.6 -> libstdc++.so.6.0.17
tar: Saltando a la siguiente cabecera
l--x------ 0/0               9 2015-10-28 12:05 init -> bin/node
tar: Un bloque de ceros aislado en 25824
tar: Saliendo con fallos debido a errores anteriores

[piranna@Mabuk:~/Proyectos/NodeOS]
 (vagga) > echo $?
2

There are two missing entries (the ones with the tar: Saltando a la siguiente cabecera message) corresponding to the lib/libgcc_s.so.1 and the bin/node files. Their stat object as given by cpio-stream are:

{ ino: 724,
  mode: 33060,
  uid: 0,
  gid: 0,
  nlink: 1,
  mtime: Wed Oct 28 2015 12:05:19 GMT+0100 (CET),
  size: 96712,
  devmajor: 3,
  devminor: 1,
  rdevmajor: 0,
  rdevminor: 0,
  _nameLength: 18,
  _sizeStrike: 96712,
  _nameStrike: 18,
  name: 'lib/libgcc_s.so.1' }
{ ino: 727,
  mode: 33133,
  uid: 0,
  gid: 0,
  nlink: 1,
  mtime: Wed Oct 28 2015 01:40:48 GMT+0100 (CET),
  size: 11216736,
  devmajor: 3,
  devminor: 1,
  rdevmajor: 0,
  rdevminor: 0,
  _nameLength: 9,
  _sizeStrike: 11216736,
  _nameStrike: 10,
  name: 'bin/node' }

I'm not sure what could be the reason for this problem, since seems it's not related with file name length or file size or permissions or being binary ones... :-/ You can find the tar file if you want to inspect it yourself at https://dropfile.to/gWBaf

Unable to set file mode

stream.entry({ name: `files/test.html`, mode: parseInt('777', 8) }, "test......");

But, result in :

โžœ  la files
total 16
-rwxr-xr-x@ 1 tony  staff   1.4K  7 31 11:30 test.html

Extract documentation error

In the README.md file for the Extracting example you have

next(); // ready for next entry

If you do this it produces an Error that next() is not defined.
Looking at you test code and a simple test I wrote this line should say

callback(); // ready for next entry

Generated tar fails to be unpacked including a unicode directory with some specific pattern

The result.tar generated by the following codes fails to be unpacked,

const tar = require('tar-stream');
const writeStream = require('fs').createWriteStream('result.tar');
const pack = tar.pack();
pack.pipe(writeStream);

// the specific pattern I found:
// here, a '0' represents an ASCII character and a 'ๅ“ˆ' represents a unicode character
const directory = './0000000ๅ“ˆๅ“ˆ000ๅ“ˆๅ“ˆ0000ๅ“ˆๅ“ˆ00ๅ“ˆ00ๅ“ˆ0ๅ“ˆๅ“ˆๅ“ˆๅ“ˆๅ“ˆ0ๅ“ˆ/0000ๅ“ˆๅ“ˆๅ“ˆ/';
const name = directory + 'somefile.txt';
const entry = pack.entry({ name }, 'any text', (...args) => console.log(args));

pack.finalize();

showing this after executing tar -xf result.tar on terminal

tar: Ignoring malformed pax extended attribute
tar: Error exit delayed from previous errors.

or something like this when double-clicked on Mac OS

Error 1: Operation not allowed

I'm working on Mac OS and have tried the codes on node of both version 6.9.1 and 7.5.0, producing the same result.

tar-stream works perfectly with almost all other unicode patterns so I think there might be a bug?

large file issue

I've got a tar file with a few large (>10GB) files in it. When tar-stream gets to the first big file, it chokes with an Invalid tar header error. I can't provide the tar file itself, but hopefully this is helpful:

it hums along through a few small files, and then it hits the header for the first large file (provided here base64 encoded):

cnMtZHMwNTk1NDhfMjAxNi0xMS0yOFQxOTAwMTEuMDAwWi9nb29kZWdncy1nYXJiYW56by9vcmRlcl9pdGVtcy5ic29uAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAADAwMDA2NDQAMDAwMTc1MQAwMDAxNzUxAIAAAAAAAAACfZ6FHjEzMDE3MTAxMDc1ADAyMzMxNgAgMAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAB1c3RhciAgAG1hZG1pbgAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAYWRtaW5zAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=

which it claims to parse and tells me is {name: 'rs-ds059548_2016-11-28T190011.000Z/goodeggs-garbanzo/order_items.bson', size: 2} ... but the size according to bsdtar on OSX is 10697475358 ...

and then it chokes parsing the next tar header (presumably because it has used the entirely wrong offset).

If I can provide more data (without providing the file itself), please let me know.

Invalid header

Hi,
I must generate a dynamic tar file from a S3 directory and as Matteo Collina suggested me I'm using yours fantastic module for tar with pump in the following way:

// this contain all files present in the directory
var files = [];
async.eachSeries(files, function(data, callback){
     // is an s3 file stream created with this module https://github.com/jb55/s3-blob-store
    var stream = store.createReadStream({ key: data.Key });
    var pack = tar.pack();
    var entry = pack.entry({ name: data.Key, size:data.Size }, function(err){
        if (err) console.log(err);
    });

    pump(stream, entry, function(err){
        if (err) console.log(err);
        if (tmp_count === files)
          pack.finalize();
        callback();
    })

    pump(pack, res, function(err){
        if (err) console.log(err);
    })
}, function(err){
    if (err) console.log(err);
    req.end();
})

this code generate the following stack trace:

Error: invalid header
    at Object.exports.decode (/tar-fs/node_modules/tar-stream/headers.js:205:40)
    at onheader (/tar-fs/node_modules/tar-stream/extract.js:103:39)
    at Extract._write (/tar-fs/node_modules/tar-stream/extract.js:206:8)
    at Extract._continue (/tar-fs/node_modules/tar-stream/extract.js:170:28)
    at oncontinue (/tar-fs/node_modules/tar-stream/extract.js:61:10)
    at ondrain (/tar-fs/node_modules/tar-stream/extract.js:81:5)
    at Extract._write (/tar-fs/node_modules/tar-stream/extract.js:206:8)
    at doWrite (/tar-fs/node_modules/tar-stream/node_modules/readable-stream/lib/_stream_writable.js:237:10)
    at writeOrBuffer (/tar-fs/node_modules/tar-stream/node_modules/readable-stream/lib/_stream_writable.js:227:5)
    at Writable.write (/tar-fs/node_modules/tar-stream/node_modules/readable-stream/lib/_stream_writable.js:194:11)

What you think about? I'm doing errors in my code or is there a problem?

Regards

Add support for @LongLink

GNU tar use @LongLink to set when the next entry will have a long name. This is used in the GCC and Linux kernel .tar.gz files, so when using tar-stream to de-compress them some files are not extracted and other times there are EACCESS errors.

error on invalid headers

This is great, but it needs to error on invalid headers!

If you can give me some hints I can put in a pull request,
my guess is to error if the checksum is incorrect,
unless the block is all nulls (as you can have null blocks inbetween files)

Packing example does not run

Apologies, still learning nodejs libraries. The Packing example does not run because myStream is not define. I assume I am supposed to open a file, and get the stream for it and assign it to myStream. Could you add that to the example so people can copy, paste, run, then edit?

var tar = require('tar-stream')
var pack = tar.pack() // p is a streams2 stream

// add a file called my-test.txt with the content "Hello World!"
pack.entry({ name: 'my-test.txt' }, 'Hello World!')

// add a file called my-stream-test.txt from a stream
var entry = pack.entry({ name: 'my-stream-test.txt' }, function(err) {
  // the stream was added
  // no more entries
  pack.finalize()
})
myStream.pipe(entry)

// pipe the pack stream somewhere
pack.pipe(process.stdout)

packing streams

it seems that pack.js line 98 would always fail to pack streams unless the size is set in the data.

I maintain ctalkington/node-archiver and would like to consolidate efforts on the creation of tar archives. would you be open to an alternative method that collects the stream before writing the header? in cases where the size isn't passed along and the source is a stream? if so, i can work up a PR.

how to zip file after tar

      const stream = tar.pack() 
      stream.entry({ name: '/foo/test.txt' }, 'hello');
      stream.finalize();

Now I have a tar stream, how to gz it?

Modifying tar in place

I perform tar file manipulation in my application. Adding, or replacing files in the tar archive. I built a wrapper that tries to keep it managed.

I'm manipulating tar files the way you suggest in the readme, by opening the tar, putting each entry into a new tar stream and writing that. However I don't really want a new tar file, so I've taken to moving the existing tar file to tmp (using fs.rename) and opening it from there. That way I can always just write to where the file is supposed to be.

It seems when I try to move a tar to tmp, while it is being accessed by tar-stream I receive an error. So, if two requests happen almost immediately after one another I get:

Error: ENOENT, rename '/<where-file-is-supposed-to-be>.tar'
    at Error (native)

At the fs.rename call.

What should I do about this? Is there a way to check whether a file operation is currently being performed?

Should I wait in that case using setTimeout?

    fs.rename(self.fullPath, tmpName, function (err) {
      if (err) {

        // TEMP: throw err
        throw err;

Is there a way to manipulate the tar file in place, without moving it? Wouldn't that just cause the same problem? I'm sorry if this is an easy question.

How to read sub folder/files ?

Hi, Sorry to raise issue here but I think you guys maintaining this module would understand this best,
So I thought I might get some help from here.

I am using this module to read a .tgz file, and to read every files's content from this tar file.
I am kind of stuck here

here is what I am trying to do :

.tgzfile structure:

root_folder
|-- _sub_folder1
|        |-- file1
|        |-- file2
|        ....
|-- _sub_folder2
...

(in coffee script) Read every sub folder and file

extract = require('tar-stream').extract()
fs.createReadStream(FILE_PATH).pipe(zlib.createUnzip()).pipe(extract)
    .on 'entry', (header, stream, callback) ->
        console.log "header -->", header.name, header.size, header.type
        if hearder.type == "directory"
             #go inside this directory and find all files
             #read content of every file......
             #           what should I do here ??
        else if hearder.type == "file"
             #read content of  this  file......

        stream.resume()
    .on 'error', ->
        console.log "error"
    .on 'finish', ->
        console.log "finished"

out put:

header --> offline_2014-08-06_16:54:28/ 0 directory
this entry end

Can finalize be called before entry streams are done?

Ideally, I would think I could add as many entries with streams as I want, then call finalize right afterward and everything would work (ie automatically wait for all the input streams to complete before actually creating the package). The documentation seems to imply that this isn't the case tho. Can I or can't i do that? If not, why not? Can we make it so finalize can be called without explicitly waiting for the streams?

Follow symlinks

Is there an option to follow symlinks, similar to tar's -L or -h option?

extract halts unexpectedly if predecessor stream is unzipped

hi,

i encountered a bug with this package:

  • if used with gunzip-maybe or zlib.createGunzip() the output stops after a few file entries
  • if the content is pre-unzipped (the same tar.gz file stored uncompressed) everything goes well
  • if the stream is consumed in the extract.on('entry') handler everything goes well

https://gist.github.com/chpio/6d0cedae59d8416d0aed

ohh, i don't know if it's relevant, i noticed the stops occur after a large file entry.

TypeError: Cannot read property 'corked' of undefined

when trying to do the following:

            var readStream = fs.createReadStream(tarballPath);
            var extractStream = tar.extract(nodeModulesPath);

            readStream
                .pipe(zlib.createGunzip())
                .pipe(extractStream);

            readStream.on('error', callback);
            extractStream.on('error', callback);
            extractStream.on('finish', callback);

I keep getting the following error in some cases. The only difference between the case where things fail and where things don't fail is when the tarball already exists on disk before running the script (doesn't fail) and when i create the tarball and try to extract it (but only after the finish event on the tar.pack(nodeModulesPath)). I've run a fs.existsSync call before the code above to confirm that the tarball exists and the node modules path does not.

TypeError: Cannot read property 'corked' of undefined
    at Writable.end (/Users/andrewdeandrade/code/unpm-install/node_modules/tar-fs/node_modules/tar-stream/node_modules/readable-stream/lib/_stream_writable.js:429:12)
    at emptyStream (/Users/andrewdeandrade/code/unpm-install/node_modules/tar-fs/node_modules/tar-stream/extract.js:18:5)
    at onheader (/Users/andrewdeandrade/code/unpm-install/node_modules/tar-fs/node_modules/tar-stream/extract.js:135:34)
    at Extract._write (/Users/andrewdeandrade/code/unpm-install/node_modules/tar-fs/node_modules/tar-stream/extract.js:207:8)
    at doWrite (/Users/andrewdeandrade/code/unpm-install/node_modules/tar-fs/node_modules/tar-stream/node_modules/readable-stream/lib/_stream_writable.js:279:12)
    at writeOrBuffer (/Users/andrewdeandrade/code/unpm-install/node_modules/tar-fs/node_modules/tar-stream/node_modules/readable-stream/lib/_stream_writable.js:266:5)
    at Writable.write (/Users/andrewdeandrade/code/unpm-install/node_modules/tar-fs/node_modules/tar-stream/node_modules/readable-stream/lib/_stream_writable.js:211:11)
    at write (_stream_readable.js:601:24)
    at flow (_stream_readable.js:610:7)
    at Gunzip.pipeOnReadable (_stream_readable.js:642:5)

This appears to be caused by the fact that _writableState is on the _parent property of the Source instance and not on the stream object itself.

I tried adding a line to the source instantiation to pass this property through so it's available in s.end(), but then I get the following error:

Error: write after end
    at writeAfterEnd (/Users/andrewdeandrade/code/unpm-install/node_modules/tar-fs/node_modules/tar-stream/node_modules/readable-stream/lib/_stream_writable.js:161:12)
    at Writable.write (/Users/andrewdeandrade/code/unpm-install/node_modules/tar-fs/node_modules/tar-stream/node_modules/readable-stream/lib/_stream_writable.js:208:5)
    at Writable.end (/Users/andrewdeandrade/code/unpm-install/node_modules/tar-fs/node_modules/tar-stream/node_modules/readable-stream/lib/_stream_writable.js:426:10)
    at Extract._write (/Users/andrewdeandrade/code/unpm-install/node_modules/tar-fs/node_modules/tar-stream/extract.js:203:12)
    at Extract._continue (/Users/andrewdeandrade/code/unpm-install/node_modules/tar-fs/node_modules/tar-stream/extract.js:171:28)
    at oncontinue (/Users/andrewdeandrade/code/unpm-install/node_modules/tar-fs/node_modules/tar-stream/extract.js:62:10)
    at onheader (/Users/andrewdeandrade/code/unpm-install/node_modules/tar-fs/node_modules/tar-stream/extract.js:143:5)
    at Extract._write (/Users/andrewdeandrade/code/unpm-install/node_modules/tar-fs/node_modules/tar-stream/extract.js:207:8)
    at Extract._continue (/Users/andrewdeandrade/code/unpm-install/node_modules/tar-fs/node_modules/tar-stream/extract.js:171:28)
    at oncontinue (/Users/andrewdeandrade/code/unpm-install/node_modules/tar-fs/node_modules/tar-stream/extract.js:62:10)

My first intuition is that https://github.com/mafintosh/tar-stream/blob/master/extract.js#L31 should read PassThrough.call(this, self);, but that didn't fix the problem either (but still might be something you want to add)

I tried checking the value of _writableState all the way through the instantiation stack. It's defined at the end of the constructor function for Writable(), but is undefined after the Writable.call(options) in the instantiation function of Duplex.

Any ideas on what could be causing this and how to fix it?

How do errors work?

Doesn't look like a tar package is an EventEmitter, which means no 'error' event. So where do errors go? How do you handle them?

Bad file type

Hi,

With the following archive :
http://download.oracle.com/otn-pub/java/jdk/7u79-b15/server-jre-7u79-windows-x64.tar.gz
The file type of the first entry is wrong. It should a directory, not a file :
jdk1.7.0_79/ file
jdk1.7.0_79/COPYRIGHT file
jdk1.7.0_79/LICENSE file
jdk1.7.0_79/README.html file
jdk1.7.0_79/release file

Here's my small program :

var tar = require('tar-stream');
var fs = require('fs');

var extract = tar.extract();
extract.on('entry', function(header, stream, callback) {
  console.log(header.name + ' ' + header.type);
  stream.on('end', function() {
    callback() // ready for next entry
  })
  stream.resume() // just auto drain the stream
});

extract.on('finish', function() {
  // all entries read
  console.log('finish extract');
});

var readStream = fs.createReadStream('jdk.tar');
readStream.on('error', function(err) {
        console.log('read error', err);
});

readStream.pipe(extract);

Pack and serve a glob directory as .tar.gz

I had a bit of trouble figuring out how to serve a directory as a .tar.gz in an Express app. Here is a snippet on how I accomplished it.

As a gist, https://gist.github.com/MadLittleMods/7eedb4001c52acec104e91dbd80618b5

const Promise = require('bluebird');
const path = require('path');
const fs = require('fs-extra');
const stat = Promise.promisify(fs.stat);
const glob = Promise.promisify(require('glob'));
const tarstream = require('tar-stream');
const zlib = require('zlib');
const express = require('express');

function targzGlobStream(globString, options) {
  const stream = tarstream.pack();

  const addFileToStream = (filePath, size) => {
    return new Promise((resolve, reject) => {
      const entry = stream.entry({
        name: path.relative(options.base || '', filePath),
        size: size
      }, (err) => {
        if(err) reject(err);
        resolve();
      });

      fs.createReadStream(filePath)
        .pipe(entry);
    });
  };

  const getFileMap = glob(globString, Object.assign({ nodir: true }, options))
    .then((files) => {
      const fileMap = {};
      const stattingFilePromises = files.map((file) => {
        return stat(file)
          .then((fileStats) => {
            fileMap[file] = fileStats;
          });
      });

      return Promise.all(stattingFilePromises)
        .then(() => fileMap);
    });


  getFileMap.then((fileMap) => {
      // We can only add one file at a time
      return Object.keys(fileMap).reduce((promiseChain, file) => {
        return promiseChain.then(() => {
          return addFileToStream(file, fileMap[file].size);
        });
      }, Promise.resolve());
    })
    .then(() => {
      stream.finalize();
    });

  return stream.pipe(zlib.createGzip());
}

const app = express();

app.get('/logs.tar.gz', function (req, res) {
  const logDirPath = path.join(process.cwd(), './logs/');
  const tarGzStream = targzGlobStream(path.join(logDirPath, '**/*'), {
    base: logDirPath
  });

  res
    .set('Content-Type', 'application/gzip')
    .set('Content-Disposition', 'attachment; filename="logs.tar.gz"');

  tarGzStream.pipe(res);
});

Thanks to #64, #25

No stream callback on a specific combination of files and sizes

Hey, I've experienced a nasty bug. When supplying an input stream of length A and there is already a directly supplied content of length B, there is no stream callback. Working example:

var fs = require("fs");
var tar = require('tar-stream');

var fileName = "demo";
var pack = tar.pack();

fs.writeFileSync(fileName, new Array(1674).join("X"), "utf-8");
pack.entry({ name: "specific-length.txt" }, new Array(13399).join("X"));

fs.stat(fileName, function (err, stat) {
    if (err) {
        return console.log(err);
    }

    var packOptions = {
        mode: stat.mode,
        mtime: stat.mtime,
        name: fileName,
        size: stat.size
    };

    var rs = fs.createReadStream(fileName);
    var entry = pack.entry(packOptions, function (err) {
        console.log("This never happens");

        pack.finalize();
        pack.pipe(fs.createWriteStream("output.tar"));
    });
    console.log("We pipe the stream here and expect a callback...");
    return rs.pipe(entry);
});

13399 followed by 1674 are the lengths I stumbled upon. I presume this happens in specific intervals based on stream buffer sizes and such. Looking into source, seems there is a disconnect between Sink's and Pack's drain dynamics. Callback is saved and never called. Didn't understand the code enough to actually fix it. :-(

Tested on node v.10.22, tar-stream 0.2.5.

Tar stream never closes

Hey there, the tar .pipe() command doesn't work as I expect, which would be similar to fs.createReadStream().pipe() ... The problem is that it never closes.

So for example, this program using tar-stream will hang.

var tar = require('tar-stream'),
    spawn = require('child_process').spawn

var pack = tar.pack();

pack.entry({name: 'hello.txt'}, 'Hello world!')

var cat = spawn('cat');

pack.pipe(cat.stdin);
cat.stdout.pipe(process.stdout);

pack.on('end', function () {
    console.log('This is never fired.');
});

While this program with fs.createReadStream will close as expected.

var fs = require('fs'),
    spawn = require('child_process').spawn

var fileStream = fs.createReadStream('test.js');

var cat = spawn('cat');

fileStream.pipe(cat.stdin);
cat.stdout.pipe(process.stdout);

fileStream.on('end', function () {
    console.log('This does fire!');
});

Writing object as entry

Bumped my head against the wall for a while until I figured out that object serialization doesn't happen automatically when writing objects to an entry.

We could do something similar to

if (typeof buffer === 'string') buffer = new Buffer(buffer)
and call JSON.stringify on the object to prevent the user from committing an empty buffer to the file?

Packing go binaries prevents entry callback from being called

This is the strangest bug, but I've narrowed it down to go binaries. tar-stream seems to silently fail to add them. Anything else I add to the tar works fine.

Steps to reproduce:

Create file test.go:

package main

import "fmt"

func main() {
	fmt.Println("test")
}

Build go file:

$ go build test.go

Create file test.js:

var fs = require("fs");
var tar = require('tar-stream');

var fileName = "./test";
var pack = tar.pack();


fs.stat(fileName, function (err, stat) {
    if (err) {
        return console.log(err);
    }

    var packOptions = {
        mode: stat.mode,
        mtime: stat.mtime,
        name: 'test',
        size: stat.size
    };

    var rs = fs.createReadStream(fileName);
    var entry = pack.entry(packOptions, function (err) {
        console.log("This happens");
        pack.finalize();
    });
    console.log("We pipe the stream here and expect a callback...");
    return rs.pipe(entry);
});

Expected result: This happens gets logged and the pack is finalized.
Actual result: This happens is never logged and pack is not finalized

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.