Why can't I extract a zip file from a POST request?

Question

I have a piece of client side code that exports a .docx file from Google Drive and sends the data to my server. It's pretty straight forward, it just exports the file, makes it into a blob, and sends the blob to a POST endpoint.

gapi.client.drive.files.export({
    fileId: file_id,
    mimeType: "application/vnd.openxmlformats-officedocument.wordprocessingml.document"
}).then(function (response) {

    // the zip file data is now in response.body
    var blob = new Blob([response.body], {type: "application/vnd.openxmlformats-officedocument.wordprocessingml.document"});

    // send the blob to the server to extract
    var request = new XMLHttpRequest();
    request.open('POST', 'return-xml.php', true);
    request.setRequestHeader("Content-type", "application/x-www-form-urlencoded");
    request.onload = function() {
        // the extracted data is in the request.responseText
        // do something with it
    };

    request.send(blob);
});

Here is my server side code to save this file onto my server so I can do things with it:

<?php
file_put_contents('tmp/document.docx', fopen('php://input', 'r'));

When I run this, the file is created on my server. However, I believe it is corrupted, because when I try to unzip it (as you can do with .docx), this happens:

$ mv tmp/document.docx tmp/document.zip
$ unzip tmp/document.zip
Archive:  document.zip
error [document.zip]:  missing 192760059 bytes in zipfile
  (attempting to process anyway)
error [document.zip]:  start of central directory not found;
  zipfile corrupt.
  (please check that you have transferred or created the zipfile in the
  appropriate BINARY mode and that you have compiled UnZip properly)

Why isn't it recognizing it as a proper .zip file?

Note for the future reader: I'm still not sure how to do this. I think I was just trying too hard to fit a zip-file-shaped peg into an access-token-shaped hole. So, I restructured the application to make the gapi export calls on the backend and do stuff with the extracted data there. — Lincoln Bergeson, May 25 '17 at 18:04

Radon8472 · Answer 1 · 2017-07-11T13:06:23.220

You should first download the original zip, and compare its content to that what yhou receive on you server, you can do this e.gg. with totalcommander or line "diff" command.

When you do this, you will see if your zip is change during transfer. With this information you can continue searching WHY it is changed. E.g. when in you zipfile ascii 10 is transformed to "13" or "10 13" it could be a line ending problem on the file transfer

Because when you open files in php with fopen(..., 'r') it can happen, that \n signs are transformed when you are using windows, you could try to use fopen(..., 'rb') wich enforces BINARY reading a file without transfering line endings.

@see: https://stackoverflow.com/a/7652022/2377961

@see php documentation fopen

score 3 · Answer 2 · answered May 24 '17 at 22:17

3

I think it may depends by that "application/x-www-form-urlencoded". So when you read the request data with php://input it saves also some http property, so the .zip it's corrupted. Try to open the .zip file and look at what there is inside. To fix, if the problem is what I said before try to change the Contenent-type to application/octet-stream.

answered May 24 '17 at 22:17

Riccardo Bonafede

620
1
9
17

How do you recommend opening the zip file to see what's inside? I can't unzip it... – Lincoln Bergeson May 24 '17 at 23:38
1

I didn't speak about unzip It, try to look at It with an hexdumper (or a normal editor, just ti see if there is some http post data) – Riccardo Bonafede May 24 '17 at 23:42
It doesn't appear that `php://input` contains any response information. Changing the content type to `application/octet-stream` didn't do anything :( – Lincoln Bergeson May 25 '17 at 14:48
1

check if the file retrived from javascript is the same saved from php. Use some hashing function – Riccardo Bonafede May 25 '17 at 14:56

score 2 · Answer 3 · answered Jul 12 '17 at 12:45

I would suggest using base64 to encode the binary data into a text stream before posting, I've done this before and it works well, using url encoding for binary data isn't going to work. Then on your server you base 64 decode to convert back to binary before storing.

Once its in base64 you can post it as text.

score 1 · Answer 4 · answered Jul 12 '17 at 16:21

1

Well, to me it is not a ZIP file. Looking at the Drive API you can see that application/vnd.openxmlformats-officedocument.wordprocessingml.document is not zipped, like application/zipis. You should handle the file as an DOCX, i think. Have you tried that?

answered Jul 12 '17 at 16:21

Holzhey

381
2
8

Yeah, when I export the file as a .docx locally I can extract just like a zip file. – Lincoln Bergeson Jul 12 '17 at 18:11
Ok, i did not know that docx can be unzipped!! Thanks! – Holzhey Jul 12 '17 at 19:35

score 0 · Answer 5 · answered Jul 12 '17 at 19:35

You are sending a BLOB (binary file) using "Content-type", "application/x-www-form-urlencoded" with no url encoding applied on the BLOB... so, the file that PHP receive is not a ZIP file, it's a corrupted one. Change the "Content-type" or apply url enconding to BLOB. You can get a better idea looking at MDN - Sending forms through JavaScript. This questions should help too: question 1, question 2. You must send the file properly.

Why can't I extract a zip file from a POST request?

5 Answers5