81

Just looking for a simple solution to downloading and unzipping .zip or .tar.gz files in Node.js on any operating system.

Not sure if this is built in or I have to use a separate library. Any ideas? Looking for just a couple lines of code so when the next zip file comes that I want to download in node, it's a no brainer. Feel like this should be easy and/or built in, but I can't find anything. Thanks!

Lance Pollard
  • 66,757
  • 77
  • 237
  • 416

11 Answers11

95

It's 2017 (October 26th, to be exact).

For an ancient and pervasive technology such as unzip I would expect there to exist a fairly popular, mature node.js unzip library that is "stagnant" and "unmaintained" because it is "complete".

However, most libraries appear either to be completely terrible or to have commits recently as just a few months ago. This is quite concerning... so I've gone through several unzip libraries, read their docs, and tried their examples to try to figure out WTF. For example, I've tried these:

Update 2020: Haven't tried it yet, but there's also archiver

Top Recommendation: yauzl

Works great for completely downloaded file. Not as great for streaming.

Well documented. Works well. Makes sense.

2nd Pick: node-stream-zip

antelle's node-stream-zip seems to be the best

Install:

npm install --save node-stream-zip

Usage:

'use strict';

var fs = require('fs');
var StreamZip = require('node-stream-zip');

var zip = new StreamZip({
  file: './example.zip'
, storeEntries: true
});

zip.on('error', function (err) { console.error('[ERROR]', err); });

zip.on('ready', function () {
  console.log('All entries read: ' + zip.entriesCount);
  //console.log(zip.entries());
});

zip.on('entry', function (entry) {
  var pathname = path.resolve('./temp', entry.name);
  if (/\.\./.test(path.relative('./temp', pathname))) {
      console.warn("[zip warn]: ignoring maliciously crafted paths in zip file:", entry.name);
      return;
  }

  if ('/' === entry.name[entry.name.length - 1]) {
    console.log('[DIR]', entry.name);
    return;
  }

  console.log('[FILE]', entry.name);
  zip.stream(entry.name, function (err, stream) {
    if (err) { console.error('Error:', err.toString()); return; }

    stream.on('error', function (err) { console.log('[ERROR]', err); return; });

    // example: print contents to screen
    //stream.pipe(process.stdout);

    // example: save contents to file
    fs.mkdir(
      path.dirname(pathname),
      { recursive: true },
      function (err) {
        stream.pipe(fs.createWriteStream(pathname));
      }
    );
  });
});

Security Warning:

Not sure if this checks entry.name for maliciously crafted paths that would resolve incorrectly (such as ../../../foo or /etc/passwd).

You can easily check this yourself by comparing /\.\./.test(path.relative('./to/dir', path.resolve('./to/dir', entry.name))).

Pros: (Why do I think it's the best?)

  • can unzip normal files (maybe not some crazy ones with weird extensions)
  • can stream
  • seems to not have to load the whole zip to read entries
  • has examples in normal JavaScript (not compiled)
  • doesn't include the kitchen sink (i.e. url loading, S3, or db layers)
  • uses some existing code from a popular library
  • doesn't have too much senseless hipster or ninja-foo in the code

Cons:

  • Swallows errors like a hungry hippo
  • Throws strings instead of errors (no stack traces)
  • zip.extract() doesn't seem to work (hence I used zip.stream() in my example)

Runner up: node-unzipper

Install:

npm install --save unzipper

Usage:

'use strict';

var fs = require('fs');
var unzipper = require('unzipper');

fs.createReadStream('./example.zip')
  .pipe(unzipper.Parse())
  .on('entry', function (entry) {
    var fileName = entry.path;
    var type = entry.type; // 'Directory' or 'File'

    console.log();
    if (/\/$/.test(fileName)) {
      console.log('[DIR]', fileName, type);
      return;
    }

    console.log('[FILE]', fileName, type);

    // TODO: probably also needs the security check

    entry.pipe(process.stdout/*fs.createWriteStream('output/path')*/);
    // NOTE: To ignore use entry.autodrain() instead of entry.pipe()
  });

Pros:

  • Seems to work in a similar manner to node-stream-zip, but less control
  • A more functional fork of unzip
  • Seems to run in serial rather than in parallel

Cons:

  • Kitchen sink much? Just includes a ton of stuff that's not related to unzipping
  • Reads the whole file (by chunk, which is fine), not just random seeks
coolaj86
  • 64,368
  • 14
  • 90
  • 108
  • 60
    Is nobody else surprised that in 2019 there is no built-in function that allows you to extract a `.zip` into a given location? – Felipe Jan 15 '19 at 22:01
  • 2
    I tried adm-zip and yauzl/yazl. yauzl/yazl supoort newer version of zip protocol and less buggy. – Powpow Feb 13 '19 at 23:12
  • 3
    You might want to add that streaming a zip is technically an invalid use of a zip file. zip was made back in floppy disk days so there is an option to freshen/add new files to an existing zip. Let's say your zip had 3 files A,B,C. Instead of writing the entire zip pkzip just adds new A to the end of the file and then puts a new central directory at the end. So if you stream you'll get 2 A files one of them invalid. – gman Sep 05 '19 at 08:09
  • mkdirp is undefined! – Ricardo G Saraiva Mar 03 '20 at 20:40
  • 6
    @Felipe Hooray! Almost 2021 and it is still painful to deal with zip files in node! – Runsis Oct 06 '20 at 22:06
  • @RicardoGSaraiva Now it's `fs.mkdir(x, { recursive: true })`. – coolaj86 Nov 24 '20 at 05:01
  • 3
    Re the comment about archiver, as of Jan 2021 it's still pack only (it does not unzip/unpack) – xaphod Jan 14 '21 at 18:54
  • unzip cannot work on nodejs v12+, so please use unzipper instead – jiajianrong Jan 28 '21 at 06:51
41

Checkout adm-zip.

ADM-ZIP is a pure JavaScript implementation for zip data compression for NodeJS.

The library allows you to:

  • decompress zip files directly to disk or in-memory buffers
  • compress files and store them to disk in .zip format or in compressed buffers
  • update content of/add new/delete files from an existing .zip
Dr1Ku
  • 2,690
  • 3
  • 43
  • 52
bryanmac
  • 37,512
  • 9
  • 85
  • 95
  • 1
    I was having errors with any zlib implementation that would kick out a n "Invalid Block Type" (error code is "Z_DATA_ERROR") error, but windows unzip would work fine on the file. adm-zip also appears to work just fine. – GotDibbs Oct 30 '13 at 15:05
  • 5
    "Invalid or unsupported zip format. No END header found" with cygwin zipped file. Bloargh. – psp Nov 12 '14 at 08:39
  • Can we `unzip` file at server side also with this module? – Manwal Mar 05 '15 at 07:10
  • I get this error using the adm-zip package, that's unworkable for me - https://github.com/cthackers/adm-zip/issues/25 – chrismarx Feb 09 '16 at 04:44
  • how come nobody experienced any problems with the way it extracts the zip to target path? – properchels Oct 12 '18 at 06:18
  • 2
    It cannot unzip large file. E.g, trying to unzip my file with a size `2857964996` throw `RangeError [ERR_INVALID_OPT_VALUE]: The value "2857964996" is invalid for option "size"` – Alexandre Annic Jun 28 '19 at 15:23
  • I have had very bad experience using adm-zip for large files. When I try to unzip a large file on a disk space that is too small, adm-zip throws "Invalid or unsupported zip format. No END header found" instead of ENOSPC. When I try to create a zip file where there is not enough space, adm-zip fails silently instead of throwing ENOSPC. – Novice Apr 26 '20 at 19:54
35

Node has builtin support for gzip and deflate via the zlib module:

var zlib = require('zlib');

zlib.gunzip(gzipBuffer, function(err, result) {
    if(err) return console.error(err);

    console.log(result);
});

Edit: You can even pipe the data directly through e.g. Gunzip (using request):

var request = require('request'),
    zlib = require('zlib'),
    fs = require('fs'),
    out = fs.createWriteStream('out');

// Fetch http://example.com/foo.gz, gunzip it and store the results in 'out'
request('http://example.com/foo.gz').pipe(zlib.createGunzip()).pipe(out);

For tar archives, there is Isaacs' tar module, which is used by npm.

Edit 2: Updated answer as zlib doesn't support the zip format. This will only work for gzip.

Linus Thiel
  • 36,497
  • 9
  • 102
  • 98
  • 13
    No. Neither of these examples work. node.js's zlib module is only for streams and buffers that represent singular resources; not zip or tar archives. – pyrotechnick May 07 '12 at 10:29
  • 3
    You seem to be misunderstanding the question. He is not trying to uncompress single streams or a single file. He is trying to extract the files from entire archives. As you mention: Isaacs' tar module will indeed work with tars but your code for zips will not extract the files from zip archives. – pyrotechnick May 10 '12 at 02:54
15

yauzl is a robust library for unzipping. Design principles:

  • Follow the spec. Don't scan for local file headers. Read the central directory for file metadata.
  • Don't block the JavaScript thread. Use and provide async APIs.
  • Keep memory usage under control. Don't attempt to buffer entire files in RAM at once.
  • Never crash (if used properly). Don't let malformed zip files bring down client applications who are trying to catch errors.
  • Catch unsafe filenames entries. A zip file entry throws an error if its file name starts with "/" or /[A-Za-z]:// or if it contains ".." path segments or "\" (per the spec).

Currently has 97% test coverage.

andrewrk
  • 27,002
  • 25
  • 87
  • 105
15

I tried a few of the nodejs unzip libraries including adm-zip and unzip, then settled on extract-zip which is a wrapper around yauzl. Seemed the simplest to implement.

https://www.npmjs.com/package/extract-zip

var extract = require('extract-zip')
extract(zipfile, { dir: outputPath }, function (err) {
   // handle err
})
Simon Hutchison
  • 2,240
  • 24
  • 28
  • 4
    +1 [extract-zip](https://github.com/maxogden/extract-zip) and [yauzl](https://github.com/thejoshwolfe/yauzl) seem to be maintained more frequently than others. No commits to [unzip](https://github.com/EvanOxfeld/node-unzip) and [adm-zip](https://github.com/cthackers/adm-zip) in 3 years and tons of issues at the time of writing! – Akseli Palén Oct 12 '17 at 16:24
  • Side-note: You can also use it as promise, no need for promisifying. – Alexander Santos Dec 15 '20 at 15:45
5

I found success with the following, works with .zip
(Simplified here for posting: no error checking & just unzips all files to current folder)

function DownloadAndUnzip(URL){
    var unzip = require('unzip');
    var http = require('http');
    var request = http.get(URL, function(response) {
        response.pipe(unzip.Extract({path:'./'}))
    });
}
Mtl Dev
  • 1,375
  • 14
  • 26
3

I was looking forward this for a long time, and found no simple working example, but based on these answers I created the downloadAndUnzip() function.

The usage is quite simple:

downloadAndUnzip('http://your-domain.com/archive.zip', 'yourfile.xml')
    .then(function (data) {
        console.log(data); // unzipped content of yourfile.xml in root of archive.zip
    })
    .catch(function (err) {
        console.error(err);
    });

And here is the declaration:

var AdmZip = require('adm-zip');
var request = require('request');

var downloadAndUnzip = function (url, fileName) {

    /**
     * Download a file
     * 
     * @param url
     */
    var download = function (url) {
        return new Promise(function (resolve, reject) {
            request({
                url: url,
                method: 'GET',
                encoding: null
            }, function (err, response, body) {
                if (err) {
                    return reject(err);
                }
                resolve(body);
            });
        });
    };

    /**
     * Unzip a Buffer
     * 
     * @param buffer
     * @returns {Promise}
     */
    var unzip = function (buffer) {
        return new Promise(function (resolve, reject) {

            var resolved = false;

            var zip = new AdmZip(buffer);
            var zipEntries = zip.getEntries(); // an array of ZipEntry records

            zipEntries.forEach(function (zipEntry) {
                if (zipEntry.entryName == fileName) {
                    resolved = true;
                    resolve(zipEntry.getData().toString('utf8'));
                }
            });

            if (!resolved) {
                reject(new Error('No file found in archive: ' + fileName));
            }
        });
    };


    return download(url)
        .then(unzip);
};
Adam
  • 4,007
  • 1
  • 23
  • 54
  • 2
    this is not scalable, as it uses memory – Vanuan Apr 24 '16 at 11:43
  • 1
    Code might be much simplier: const zipEntries = new AdmZip(buffer).getEntries() const output = zipEntries.filter(zipEntry => zipEntry.entryName == fileName).map(zipEntry => zipEntry.getData().toString('utf8')) return output.length > 0 ? resolve(output) : reject(new Error('No file found in archive: ' + fileName)) – sgracki Jan 10 '19 at 14:53
0

Checkout gunzip-file

import gunzip from 'gunzip-file';

const unzipAll = async () => {
  try {
    const compFiles = fs.readdirSync('tmp')
    await Promise.all(compFiles.map( async file => {
      if(file.endsWith(".gz")){
        gunzip(`tmp/${file}`, `tmp/${file.slice(0, -3)}`)
      }
    }));
  }
  catch(err) {
    console.log(err)
  }
}
arnaudjnn
  • 415
  • 6
  • 10
-1

Another working example:

var zlib = require('zlib');
var tar = require('tar');
var ftp = require('ftp');

var files = [];

var conn = new ftp();
conn.on('connect', function(e) 
{
    conn.auth(function(e) 
    {
        if (e)
        {
            throw e;
        }
        conn.get('/tz/tzdata-latest.tar.gz', function(e, stream) 
        {
            stream.on('success', function() 
            {
                conn.end();

                console.log("Processing files ...");

                for (var name in files)
                {
                    var file = files[name];

                    console.log("filename: " + name);
                    console.log(file);
                }
                console.log("OK")
            });
            stream.on('error', function(e) 
            {
                console.log('ERROR during get(): ' + e);
                conn.end();
            });

            console.log("Reading ...");

            stream
            .pipe(zlib.createGunzip())
            .pipe(tar.Parse())
            .on("entry", function (e) 
            {    
                var filename = e.props["path"];
                console.log("filename:" + filename);
                if( files[filename] == null )
                {
                    files[filename] = "";
                }
                e.on("data", function (c) 
                {
                    files[filename] += c.toString();
                })    
            });
        });
    });
})
.connect(21, "ftp.iana.org");
-1

Download and extract for .tar.gz:

const https = require("https");
const tar = require("tar");

https.get("https://url.to/your.tar.gz", function(response) {
  response.pipe(
    tar.x({
      strip: 1,
      C: "some-dir"
    })
  );
});
vitaliytv
  • 347
  • 3
  • 6
-3

You can simply extract the existing zip files also by using "unzip". It will work for any size files and you need to add it as a dependency from npm.

fs.createReadStream(filePath).pipe(unzip.Extract({path:moveIntoFolder})).on('close', function(){
        //To do after unzip
    callback();
  });