Apologies in advance as I think this is a question that will just help my understanding rather than an issue (I think).
I have a createReadStream instantiated from the 'webhdfs' package. I can read in a stream of json coming from this filesystem no problem, via the chunk method, and I can serve up the full contents of that file via express endpoint. All good. The problem is when I have an lzo compressed file on that file system.
I've tried a few different ways and I either segmentation fault or it complains about how the buffer has to be an array, or just shows the buffer output, etc, depending on how I code it.
The most current, which I feel is close to what I'm supposed to be doing, looks like this:
var remoteFileStream = hdfs.createReadStream('/user/me/telemetry_hdfs.'+req.params.hdfsfile);
var data = new Buffer();
remoteFileStream.on('data', function onChunk (chunk) {
// Concatenate the chunks into the buffer
data = Buffer.concat([data, chunk]);
console.log(bufferdata);
})
Then I'd go on to decompress using lzo when the full data has been received.
The problem is I can't get past this error:
TypeError: First argument must be a string, Buffer, ArrayBuffer, Array, or array-like object.
at fromObject (buffer.js:280:9)
at Function.Buffer.from (buffer.js:106:10)
at new Buffer (buffer.js:85:17)
at /home/centos/telemetry/new_hdfstest.js:36:14
at Layer.handle [as handle_request] (/home/centos/telemetry/node_modules/express/lib/router/layer.js:95:5)
at next (/home/centos/telemetry/node_modules/express/lib/router/route.js:137:13)
at Route.dispatch (/home/centos/telemetry/node_modules/express/lib/router/route.js:112:3)
at Layer.handle [as handle_request] (/home/centos/telemetry/node_modules/express/lib/router/layer.js:95:5)
at /home/centos/telemetry/node_modules/express/lib/router/index.js:281:22
at param (/home/centos/telemetry/node_modules/express/lib/router/index.js:354:14)
TypeError: First argument must be a string, Buffer, ArrayBuffer, Array, or array-like object
If I instantiate the Buffer() with an empty string inside, I don't error out, so I know the "new Buffer()" piece might have problems as I am instantiating as a string.. Here is what I'm talking about:
var data = new Buffer(''); /// <<<< ------ HERE
var rawchunk;
var chunklength=0;
remoteFileStream.on('data', function onChunk (chunk) {
// Do something with the data chunk
chunklength += chunk.length;
console.log(chunklength);
data = Buffer.concat([data, chunk]);
})
But then when I console out what "data" is after the full buffer is finished, it appears I only have 1 of the buffer chunks, though I know it fully completes. Here is what the output looks like:
....
18518867
18584403
18649939
18715475
18781011
18846547
18912083
18977619
19043155
19108691
19174227
19239763
19305299
19370835
19436371
19501907
19567443
19632979
19698515
19764051
19829587
19895123
19960659
20026195
20091731
20157267
20222803
20288339
20353875
20419411
20428832
<Buffer 89 4c 5a 4f 00 0d 0a 1a 0a 10 20 20 30 09 40 01 05 03 00 00 0d 00 00 00 00 59 23 90 aa 00 00 00 00 00 1e e1 02 96 00 04 00 00 00 00 65 39 5f fb 32 25 ... >
read is fully complete
The code snippet that produced that output, including the on('finish') method was this:
var data = new Buffer('');
var chunklength=0;
remoteFileStream.on('data', function onChunk (chunk) {
// Do something with the data chunk
chunklength += chunk.length;
console.log(chunklength);
data = Buffer.concat([data, chunk]);
})
remoteFileStream.on('finish', function onFinish () {
// Read is done
console.log(data)
console.log('read is fully complete')]\
});
I was thinking I would put the lzo.decompress() in that last on('finish') method, but the fact that console.log(data) only produces 1 line makes me think there is something wrong.. I get a Segmentation fault when I try to decompress it! Here is the code where I'm trying to do the actual lzo.decompress():
remoteFileStream.on("finish", function onFinish () {
// Read is done
console.log('read is fully complete')
var decompressed = lzo.decompress(data)
});
I've also tried doing a .pipe(lzo.decompress()) at the end of the hdfs.createReadStream(), and that fails in other ways. It would be cool to figure out how to do it like that but I feel like I should at least be able to figure out this buffer approach. I know this is not an "issue" with the LZO package rather more of a clarification of how it can be used with a buffer, and helping me understand the buffer in general, so I appreciate the patience in advance!!
Thanks!
Chris