nika-begiashvili / libarchivejs Goto Github PK

View Code? Open in Web Editor NEW

285.0 10.0 35.0 4.68 MB

Archive library for browsers

License: MIT License

HTML 2.98% Shell 0.41% C 1.61% JavaScript 85.58% Python 2.46% Dockerfile 1.48% TypeScript 5.47%

archive browser zip rar tar gzip 7zip extract webassembly bzip2

libarchivejs's Introduction

Libarchivejs

Overview

Libarchivejs is a archive tool for browser and nodejs which can extract and create various types of compression, it's a port of libarchive to WebAssembly and javascript wrapper to make it easier to use. Since it runs on WebAssembly performance should be near native. Supported formats: ZIP, 7-Zip, RAR v4, RAR v5, TAR .etc, Supported compression: GZIP, DEFLATE, BZIP2, LZMA .etc

Version 2.0 highlights!

Create archives
Use it in NodeJS

How to use

Install with npm i libarchive.js and use it as a ES module.

The library consists of two parts: ES module and webworker bundle, ES module part is your interface to talk to library, use it like any other module. The webworker bundle lives in the libarchive.js/dist folder so you need to make sure that it is available in your public folder since it will not get bundled if you're using bundler (it's all bundled up already) and specify correct path to Archive.init() method

if libarchive.js file is in the same directory as bundle file than you don't need to call Archive.init() at all

import {Archive} from 'libarchive.js/main.js';

Archive.init({
    workerUrl: 'libarchive.js/dist/worker-bundle.js'
});

document.getElementById('file').addEventListener('change', async (e) => {
    const file = e.currentTarget.files[0];

    const archive = await Archive.open(file);
    let obj = await archive.extractFiles();
    
    console.log(obj);
});

// outputs
{
    ".gitignore": {File},
    "addon": {
        "addon.py": {File},
        "addon.xml": {File}
    },
    "README.md": {File}
}

More options

To get file listing without actually decompressing archive, use one of these methods

    await archive.getFilesObject();
    // outputs
    {
        ".gitignore": {CompressedFile},
        "addon": {
            "addon.py": {CompressedFile},
            "addon.xml": {CompressedFile}
        },
        "README.md": {CompressedFile}
    }

    await archive.getFilesArray();
    // outputs
    [
        {file: {CompressedFile}, path: ""},
        {file: {CompressedFile},   path: "addon/"},
        {file: {CompressedFile},  path: "addon/"},
        {file: {CompressedFile},  path: ""}
    ]

If these methods get called after archive.extractFiles(); they will contain actual files as well.

Decompression might take a while for larger files. To track each file as it gets extracted, archive.extractFiles accepts callback

    archive.extractFiles((entry) => { // { file: {File}, path: {String} }
        console.log(entry);
    });

Extract single file from archive

To extract a single file from the archive you can use the extract() method on the returned CompressedFile.

    const filesObj = await archive.getFilesObject();
    const file = await filesObj['.gitignore'].extract();

Check for encrypted data

    const archive = await Archive.open(file);
    await archive.hasEncryptedData();
    // true - yes
    // false - no
    // null - can not be determined

Extract encrypted archive

    const archive = await Archive.open(file);
    await archive.usePassword("password");
    let obj = await archive.extractFiles();

Create new archive

Note: pathname is optional in browser but required in NodeJS

    const archiveFile = await Archive.write({
        files: [
            { file: file, pathname: 'folder/file.zip' }
        ],
        outputFileName: "test.tar.gz",
        compression: ArchiveCompression.GZIP,
        format: ArchiveFormat.USTAR,
        passphrase: null,
    });

Use it in NodeJS

    import { Archive, ArchiveCompression, ArchiveFormat } from "libarchivejs/dist/libarchive-node.mjs";
    
    let buffer = fs.readFileSync("test/files/archives/README.md");
    let blob = new Blob([buffer]);

    const archiveFile = await Archive.write({
      files: [{ 
        file: blob,
        pathname: "README.md",
      }],
      outputFileName: "test.tar.gz",
      compression: ArchiveCompression.GZIP,
      format: ArchiveFormat.USTAR,
      passphrase: null,
    });

How it works

Libarchivejs is a port of the popular libarchive C library to WASM. Since WASM runs in the current thread, the library uses WebWorkers for heavy lifting. The ES Module (Archive class) is just a client for WebWorker. It's tiny and doesn't take up much space.

Only when you actually open archive file will the web worker be spawned and WASM module will be downloaded. Each Archive.open call corresponds to each WebWorker.

After calling an extractFiles worker, it will be terminated to free up memory. The client will still work with cached data.

libarchivejs's People

Contributors

Stargazers

Watchers

libarchivejs's Issues

code error when I using 'Archive.init({ workerUrl: 'libarchivejs/dist/worker-bundle.js' }) '

working example for creating and writing archive files in node and browser

Any working example present for creating and writing archive files in node and browser? #54

Error: Buffer exhausted

I'm working with libarchive.js in the browser and it seems that I load the worker correctly when i try to create a ZIP using this code bit:

I get Error: Buffer exhausted

And in the worker reference, I get this code:

I wanna disclaim I'm a bit in doubt about the compression and format what works together and how do i get a ZIP file

[QUESTION] Can i used this lib to create zip files ?

Way to extract just one file from an archive?

You can call getFilesObject and getFilesArray to get a list of all of the files in an archive, but it seems like the only option to extract any files is to call extractFiles which extracts all of the files.

Would it be possible to add a way to only extract one of the files in the array?

Terminate worker manually

After calling an extractFiles worker, it will be terminated to free up memory. The client will still work with cached data.

I need to use extractFile instead, how can I free up memory without calling extractFiles ?

Something like archive.close() will be nice to have.

'worker-bundle' file alwsys throw error like 'Uncaught SyntaxError: Unexpected token '<''

like this ,
my vue project want to unzip some zipfile,
so import your 'libarchive.js' to do this,
but in the browser always thorw an error like Uncaught SyntaxError: Unexpected token '<',
there is my code:
Archive.init('../../node_modules/libarchive.js/dist/worker-bundle.js'); const archive = await Archive.open(zipFile); let obj = await archive.extractFiles(); console.log(obj);
zipfile is my zipfile and init path is true or empty always throw this error.
my English is not good, washing your reply.

Streaming API

I am considering to use this library with big files (read archives >4GB). Is there a possibility to implement streaming the output of a file extraction action without storing it in memory? Otherwise I'll probably end up with multiple GB of RAM usage only to hold the data that the library extracted.

How can I get unzipped file data from entry data?

{File} contains only file name, date, etc.

How can I get file data?

archive.extractFiles((entry) => { // { file: {File}, path: {String} }
        console.log(entry);
});

Suppot unicode on file names

Unicode symbols / characters are not detected so they are replaced on the path name

À.png --> *.png

See: No way to retrieve filename in UTF-8 without setting locale #587

More Related issues

how to create archive?

README doesn't tell us to create archive.

trying to use libarchivejs to decode a pure bzip2 stream

Hi,

We have a stream/byte array which doesn't have an archive wrapper (i.e. it's not wrapped in a zip archive - it's just a bzip2 compressed file). How can we use libarchivejs to decompress this stream/byte array?

Typescript support

Do you plan on generating declaration files for supporting Typescript?
You could either have them as part of the repo or contribute to https://github.com/DefinitelyTyped/DefinitelyTyped/

Thanks in advance!

Can't extract gz

Version: 1.3.0
Symptom: extract gz would get Memory read error -30

I found related issue and mr in libarchive and fix it in v3.4.0

Do you have any plan to upgrade?

Thanks

Decompress single compressed .gz file

Can someone give me an example of how to decompress a file using libarchive?
It is just a .gz file but it's not a tar archive. I'm assuming libarchive can do that as well.

I get an "Unrecognized archive format" error when I try to call getFilesArray after opening.

Cannot extract encrypted 7Z archive with encrypted filenames

Hi,
I am trying to extract a 7Z archive with encrypted filenames. Upon calling getFilesObject or extractFiles all I get is an empty object back. Is this feature not (yet?) supported?

I am attaching such an archive. The password is 'abc' (without quotes). I had to append '.zip' to the filename to enable attaching to this issue. Upon downloading, rename it to make sure .7z is its extension.

Thanks!

PS: If I don't choose to encrypt the filenames (just the contents) then everything works as expected.

TwoFileArchive-EncFNs-pass-is-abc.7z.zip

Promise.withResolvers

When trying to create an archive in the browser chrome v120 i get the

ERROR TypeError: Promise.withResolvers is not a function

this is probably due to it not being supported yet in most browsers suggesting an alternative implementation like

let resolve, reject;

const promise = new Promise((res, rej) => {
  resolve = res;
  reject = rej;
});

I know it doesn't look as nice in the code but its gonna have a much better browser support

Failed to execute 'postMessage' on 'Worker': Response object could not be cloned

Bonjour, j'aimerais dézipper une archive .7z depuis mon application ionic sauf que j'avais un problème identique à celui-là je l'ai résolu en mettant le dossier libarchive.js dans mon répertoire public mais depuis j'ai cette erreur dans la console, comment puis-je faire ?

How to get the time of the files in the compressed package

How to get the time of the files in the compressed package
archive_entry_mtime?

How to know a archive file is encrypted or not?

It seems encrypted archive file can not be extracted.

How to know a archive file is encrypted (or could be extracted) before I call the extract function?

How to extract to local directory?

Sorry for asking a newbie question again, but it's keep bugging my mind.
I can listing the file inside an archive and log to the console, but i can't see the file in my local directory, am i missing something?

This is my code

function finish() {
    const d = document.createElement('div');
    d.setAttribute('id', 'done');
    d.textContent = 'Done.';
    document.body.appendChild(d);
}

document.getElementById('file').addEventListener('change', async e => {
    let obj = null;

    try {
        const file = e.currentTarget.files[0];
        const archive = await Archive.open(file);
        //obj = await archive.extractFiles();
        await archive.extractFiles();
        //console.log(obj);
    } catch (err) {
        console.error(err);
    } finally {
        //window.obj = obj;
        finish();
    }
}

Help please, thank's.

Multiple compressedFile.extract() return wrong content

Problem
Concurrent compressedFile.extract() execution breaks correspondence of files with contents.
Because when message cames from webworker, always called the last element of _callbacks in current implementation. but multiple extractSingleFile calling cause _callbacks to stack up and return contents in wrong order.
How to reproduce

Modify test/files/test-single.html :

@@ -65,7 +65,7 @@
                     const file = e.currentTarget.files[0];
                     const archive = await Archive.open(file);
                     const files =  await archive.getFilesArray();
-                    fileObj = await files[0].file.extract();
+                    fileObj = (await Promise.all(files.map(f =>f.file.extract()) ))[0];
                 }catch(err){
                     console.error(err);
                 }finally{

and npm run test

How to fix
Attach message id to message for communicate with WebWorker and response are routed by message id.

ISO file creation returns an archive size of 0, while gzip creation works fine

When attempting to create an ISO file using libarchive.js, the returned archive size is 0 bytes.
However, when creating a gzip file with the same process and files, the output is correct and functional.
Below is the code snippet used for creating the ISO file:

import {
  Archive,
  ArchiveFormat,
  ArchiveCompression,
} from 'libarchive.js/dist/libarchive.js';

// `files` is the files of the input of the file type.
const allFiles = [];
for (let i = 0; i < files.length; i++) {
  const file = files[i];
  const relativePath =
    file.webkitRelativePath || file.relativePath || file.name;
  allFiles.push({ file, pathname: relativePath });
}

const archiveFile = await Archive.write({
  files: allFiles,
  outputFileName: 'mount.iso',
  compression: ArchiveCompression.NONE,
  format: ArchiveFormat.ISO9660,
  passphrase: null,
});

Steps to Reproduce:

Prepare a list of files to be included in the archive (allFiles).
Use the above code snippet to attempt to create an ISO file.
Check the size of the generated mount.iso file.

Expected Behavior:

The ISO file should be created with the appropriate size, containing all the specified files.

Actual Behavior:

The generated ISO file (mount.iso) has a size of 0 bytes.

Additional Information:

The same process works correctly when creating a gzip file.
The issue seems to be specific to the ISO creation format.
No errors or warnings are thrown during the process.

Environment:

Library Version: [email protected]
Browser: Chrome 125.0.6422.113
Operating System: windows 10

When libarchivejs parses Chinese names, it will be replaced by *

Issue with zip created by mac (with __MACOSX folder)

When I extract the following file using native tools (on ubuntu)
test.zip all files are openable and usable.

When I extract the same file using libarchivejs, all of the files under the __MACOSX folder work properly, however all other files are corrupted (unable to open them in their respective formats).

Do you have any guidance on how to fix this?

Support zstd?

ZSTD support exists for tar, could you support it?

Better Bundling support

Passing the URL to worker may not work due to bundlers mangling file names:

this._worker = new Worker(options.workerUrl);

Maybe that could be solved by allowing passing the worker directly:

this._worker = options.worker
    ? options.worker
    : new Worker(options.workerUrl);

Is order of callbacks wrong?

I found that callbacks are called by first-in-last-out.
Should it be first-in-first-out?

Can use by nodejs?

Hi.

Are there any plans to make it available on nodejs?

I think so, If it wasm, porting to nodejs is easy.
But I'm not sure.

I hope it can be used by nodejs.

Sort order of getFilesArray()

Will be possible to add an option to sort the order of the array returned for this function inside the worker ?

Unsupported block header size (was 5, max is 2)

Hello,

I come from Vietnam, thank you for developing and sharing such a wonderful library. However, I encountered an issue with a RAR file with the password: "Unsupported block header size (was 5, max is 2)".

I have used wait archive.hasEncryptedData(); to check, but the result returned null instead of true.

I am using version v1.3.0 in an Angular application. I would greatly appreciate your assistance!

Best regards,
Anh Duc Le

Archive.open() is freezing and gives no error or results

import { Archive } from 'libarchive.js/main.js';

Archive.init({
    workerUrl: 'libarchive.js/dist/worker-bundle.js'
});


export const handleFile = async (file) => {
  console.log('file: ', file)
  console.log('_7zOpen !! ')

  const archive = await Archive.open(f);
  console.log('archive: ', archive)

  const filesObject = await archive.getFilesObject();
  console.log("filesObject: ", filesObject)

  const filesArray = await archive.getFilesArray();
  console.log("filesArray: ", filesArray)

  return filesArray
}

This runs, but the console output only print out file and _7zOpen !! and then just stops there without any further response. No error is thrown, and the line console.log('archive: ', archive) never gets executed.

console log output:

file:  File {path: 'CE027001-120011101924-T100.7z', 
name: 'CE027001-120011101924-T100.7z', 
lastModified: 1599377977254, 
lastModifiedDate: Sun Sep 06 2020 16:39:37 GMT+0900, webkitRelativePath: '', …}
lastModified: 1599377977254
lastModifiedDate: Sun Sep 06 2020 16:39:37 GMT+0900 {}
name: "CE027001-120011101924-T100.7z"
path: "CE027001-120011101924-T100.7z"
size: 75602083
type: "application/x-7z-compressed"
webkitRelativePath: ""
[[Prototype]]: File

_7zOpen !!

I suspect that the Archive object is never really running despite the fact that it is installed and imported with no problem.

What is going on?
How can I move forward to debug this issue?

wasm streaming compile failed: TypeError: WebAssembly: Response has unsupported MIME type

Hello.
Trying out this lib for the first time and I'm getting errors. At first I tried linking through some CDN but the workerUrl did not like loading remote content. I then downloaded "Latest Release" and unpacked everything into a folder (edwardleuf.org/js/libarchivejs...) and tried again, but I get this compile error. Searching around tells me I need to add the wasm mime type to a server config file, but I don't have that access. I also don't have npm access so that is why I did not install it in that way.

Current error message:

"wasm streaming compile failed: TypeError: WebAssembly: Response has unsupported MIME type '' expected 'application/wasm'"        [worker-bundle.js:1:69897]
"falling back to ArrayBuffer instantiation"                                                                                       [worker-bundle.js:1:69897]
message: "FileReader.readAsArrayBuffer: Argument 1 is not an object."
stack: "open@https://edwardleuf.org/js/libarchivejs/dist/worker-bundle.js:1:493
self.onmessage@https://edwardleuf.org/js/libarchivejs/dist/worker-bundle.js:1:71325
EventHandlerNonNull*@https://edwardleuf.org/js/libarchivejs/dist/worker-bundle.js:1:71172
@https://edwardleuf.org/js/libarchivejs/dist/worker-bundle.js:1:72016"

Quick implementation for testing purposes:

<html>
<body>
<script type="importmap">
{
	"imports":
	{
		"ARC": "/js/libarchivejs/main.js"
	}
}
</script>
<script type="module">

import { Archive } from "ARC";
Archive.init({workerUrl: "/js/libarchivejs/dist/worker-bundle.js"});

const arc = await Archive.open("ponedward.7z");

</script>
</body>
</html>

Error accessing libarchive.js

I'm trying to use this libarchive.js package with node, but i got the following error:
Message:

I tried to make use of require instead of import as well but still no luck!
I'm using node v10.16.0
This is my code:

import {Archive} from 'libarchive.js/main.js';

Archive.init({
    workerUrl: 'libarchivejs/dist/worker-bundle.js'
});

const zipFile = 'zip/files.zip';
 
const files = async (e) => {
    const archive = await Archive.open(zipFile);
    let obj = await archive.getFilesArray();
    
    console.log(obj);
};

Am i doing it wrong or something?

Create ZIP file

Thanks for the great library.

Is there any way I can create a ZIP file with signature 50 4b 03 04? The configuration below results in 42 5A 68 (bzip2) for example:

const outputFile = await Archive.write({
  files: outFiles,
  outputFileName: "test.epub",
  compression: ArchiveCompression.BZIP2,
  format: ArchiveFormat.ZIP
});

LZMA Corrupted Input Data

I'm struggling to decompress lzma data using this library. The data comes as part of proprietary file.
I'll show you three examples of how I get the data that needs to be decompressed:
I'll add the first 16 bytes so you can see header and the start of the raw data:

1. 5d 00 00 00 04 00 00 68 80 f9 08 72 b3
2. 5d 00 00 00 04 00 38 8f 41 4c 35 9a 6a
3. 5d 00 00 00 04 00 00 68 9a a5 37 83 51

I know that I can successfully decompress this using lzma utility on ubuntu only when I add the decompressed size (real size or -1) at offset 5. I can also use this approach to decompress it using another js package. This is in fact what I have been doing until now. However, the performance is very bad and its not very well maintained as well, that's why I'm trying to migrate.

But when passing this to libarchive.js it will not even recognize that it is compressed using lzma.

So how am I supposed to pass the data in this scenario?

When using libarchivejs to decompress certain ZIP formats, if the compressed package contains files with Chinese names, the decompressed file names will return empty.

Unsorted files on rar archives

getFilesArray() is returning unsorted items for some rar archives.
Example archive: https://github.com/workhorsy/uncompress.js/blob/master/example_rar_5.rar

Any ideas why ?