7

I want to download gzipped csv files from a web server and ungzip then in the browser.

So far I have tried using pako and zlib to take a file gzipped on my server, but have had various issues. Trying to unzip a unix-gzipped file, I kept getting an incorrect header message.

Next, I tried using node to zip the file on the server, and am currently getting this error

Uncaught Error: invalid file signature:,�

Here is the command I am using to get the file:

$.ajax({ type: "GET", url: 'public/pols_zlib.csv.gz'})
  .done(function(d){
    var gunzip = new Zlib.Gunzip(d);
    plain = gunzip.decompress(); 
  });

I am looking for any way to zip a file on my server and unzip it in the browser.

Solomon
  • 4,645
  • 2
  • 23
  • 34
  • 8
    Do the compression at the HTTP level and let the browser take care of decompressing it behind the scenes. – Quentin Jun 30 '14 at 12:37
  • quentin, what exactly does that look like in an ajax call? – Solomon Jun 30 '14 at 12:45
  • Like a load of compressed data with a response header saying that it is compressed. – Quentin Jun 30 '14 at 12:57
  • @Quentin would that mean I would need to change something on the server that served the zipped files? Or, could I set that in the javascript call? I don't control server options in this case, so changing how the server sends the file is not really an option. Whatever the solution would be, it would have to use javascript. – Solomon Jun 30 '14 at 14:18
  • Maybe you will need a server side script to generate a HTTP response with the properly header and write your CSV file into the response stream. Are you familiar with any server side technology? – gustavodidomenico Jul 02 '14 at 15:28
  • @gustavodidomenico for this case, I don't control the server, and can't change the response headers, so that's why I need the solution in javascript – Solomon Jul 02 '14 at 16:23

5 Answers5

2

You do not need to gzip the .csv files on the server (unless your main goal is to save disk space on the server). This answer assumes your goal is to reduce the time it takes to download the .csv file to the client.

As Quentin mentioned above, all modern web servers handle over-the-wire compression for you. This means the .csv files (and all text-based documents for that matter) can be compressed before being sent to the client. The client (the web browser) then decompresses the file for you. To ensure these things are working correctly, you can sniff the HTTP traffic using a tool like Fiddler. This screenshot shows how this web page is compressed using GZIP.

enter image description here

To ensure compression is used, both the server and client need to 'advertise' the fact using HTTP headers. On the client, this can be done with ajax like this:

$.ajax({
  ...
  headers: { "Accept-Encoding" : "gzip" },
  ...
});

If the server has compression enabled, it would respond with the following http header:

Content-Encoding: gzip

As seen here in Fiddler:

enter image description here

You can read more about HTTP compression here.

Finally, I recommend turning HTTP compression on/off and using Fiddler to benchmark the results.

Shawn McGough
  • 1,894
  • 2
  • 22
  • 30
  • 1
    in this case, I do not have control over the server, and the files I upload need to be under a certain size in order to be uploaded. That is why all the decompression has to be done in javascript, and I cannot change how the server replies. In this case, the server does not reply with the correct headers that will have the browser automatically ungzip the file. Additionally, jquery will not set accept encoding, witht the error message that it is an unsafe header. – Solomon Jul 09 '14 at 18:12
  • I see. In that case, this response from another thread might be what you are looking for: http://stackoverflow.com/a/5633128/139774 It discusses the use of JSXCompressor, which can decompress on the client. – Shawn McGough Jul 09 '14 at 19:15
2

Another answer for a pure binary solution that requires the browser to support typedarrays. With this method, there is no need to use base64 encoding, thus allowing for smaller file size. This solution is recommended when older browser support is not a requirement.

Download and add a reference to pako_inflate.min.js.

Here is the HTML that I have tested.

<html>
<head>
    <title>Binary Example</title>
    <script src="//ajax.googleapis.com/ajax/libs/jquery/1.11.1/jquery.min.js"></script>
    <script src="~/Scripts/pako_inflate.min.js" type="text/javascript"></script>
    <script type="text/javascript">
        var oReq = new XMLHttpRequest();
        oReq.open("GET", 'file.csv.gz?_=' + new Date().getTime(), true);
        oReq.responseType = "arraybuffer";
        oReq.onload = function (oEvent) {
            var arrayBuffer = oReq.response; // Note: not oReq.responseText
            if (arrayBuffer) {
                var byteArray = new Uint8Array(arrayBuffer);
                var data = pako.inflate(byteArray);
                //$('body').append(String.fromCharCode.apply(null, new Uint16Array(data)));  // debug
                $('#link').attr("href", "data:text/csv;base64," + btoa(String.fromCharCode.apply(null, new Uint16Array(data))));
            }
        };
        oReq.send(null);
    </script>
</head>
<body>
    <a id="link" download="file.csv">file</a>
</body>
</html> 
Shawn McGough
  • 1,894
  • 2
  • 22
  • 30
1

I believe my earlier answer has value so I am creating a separate one here that addresses this more specific use case. The conditions are:

  1. cannot control the server
  2. must limit the file size of the csv's prior to uploading
  3. the server is not encoding the csv's with gzip

I suggest using the JSXCompressor library to decode gzip files in javascript on the client.

However, the gzip'd files must first be base64 encoded. The following linux command will do this:

gzip -c file.csv | base64 > file.csv.gz.txt

I recommend using the .txt file extension to ensure the server handles it like text.

Since I'm using the DataURI to download the csv (see below), you could also base64 encode it before gzipping to save doing that on the client. However, it increases the file size (which you are trying to avoid).

Once the files are gzip'd & base64'd, then can be uploaded to the server. Note that base64 will add substantial overhead but it is required. It is more pronounced with smaller files:

uncompressed:        91 kB
compressed:          38 kB
compressed + base64: 72 kB

uncompressed:        8.2 MB
compressed:          1.9 MB
compressed + base64: 2.6 MB

Here is the HTML markup. This is a working example that I have tested.

<html>
<head>
    <title>Working Example</title>
    <script src="//ajax.googleapis.com/ajax/libs/jquery/1.11.1/jquery.min.js"></script>
    <script src="jsxcompressor.min.js" type="text/javascript"></script>
    <script type="text/javascript">
        $.ajax({
            url: "/file.csv.gz.txt",
            cache: false
        })
            .done(function (b64file) {
                // $('body').append(b64file);  // debug
                var binary = JXG.decompress(b64file);
                $('#link').attr("href", "data:text/csv;base64," + btoa(binary));
            });
    </script>

</head>
<body>
    <a id="link" download="file.csv">file</a>
</body>
</html> 
Shawn McGough
  • 1,894
  • 2
  • 22
  • 30
  • Glad it worked. I've added yet another answer for a pure binary solution that allows you to omit the base64 step (thus reducing file size) if you don't need to support older browsers. – Shawn McGough Jul 11 '14 at 15:05
0

googling "zip and unzip in php and js" gave me this:

Community
  • 1
  • 1
Quicker
  • 1,249
  • 8
  • 15
  • unfortunately, following the answers from these questions is what lead to the invalid file signature issue. – Solomon Jul 07 '14 at 12:21
0

You are having this issue because ajax will response with header text/html.

Maybe something like this can help you:

jQuery File Download Plugin for Ajax

peterpeterson
  • 1,182
  • 2
  • 12
  • 34
  • Thanks for the answer! Unfortunately, the problem here is using the file information within the browser, not downloading the file, so this plugin wouldn't solve the issue. – Solomon Jul 09 '14 at 18:14
  • but with this plugin you won't download the file, you will have the file content into a variable as when you do an ajax call. – peterpeterson Jul 10 '14 at 09:11