7

We receive a local file (typically a PDF, PNG or JPG) by drag and drop in a variable (using dropzone.js - at this stage it's base64 plus the characters to specify the file type). We encrypt it (now it's binary) into a javascript variable. We then create a Blob using that variable and upload it to a server running PHP. (See our finding out how to send a js variable to PHP $_FILE.)

We are finding that the .size of the blob is about 50% larger than the .length of the file we are uploading. (We had been uploading by converting to base64 then uploading with JSON, but one reason we are looking to change is to hopefully avoid the 33% bump in size from using base64.)

The blob is consistently about 50% larger from moderate sizes up to larger sizes. As a small test, we created a Blob using 120 chars as input and found the Blob.size to be 210. (We normally use the correct file.type; image/png was just to have it be interpreted as binary data that didn't need encoding.) From actual use in our code: we uploaded a 900K PDF file. Type was something like 'application/pdf'. The resultant blob was like 1,400K. Also tried with PNG.

I would think that the Blob should be about the same size as the input,no? What might we be doing wrong?

new Blob(["123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890"], {type:"image/png"});
Community
  • 1
  • 1
Mark Kasson
  • 1,492
  • 15
  • 24
  • 1
    I haven't done the math, but Blob length is equivalent to blob.size - the size in bytes of the Blob, String.length is the count of chars in the string, However, one char = 2 bytes. I understand that with this logic the blob.size would be twice that of the string but I'll do a little googling and get back to you on that one but this has to be the long and short of it. – TechnicalChaos Apr 29 '15 at 19:05
  • Sorry, .size is the correct property name. I'll correct it above. We don't usually use text; I just did that as a quick test. (I'm not sure about the encoding using 2 bytes/char by default, but I'll take your word for it.) Usually our files are things like PDF or PNG. I'll be clearer about that as well. – Mark Kasson Apr 29 '15 at 19:19
  • They're equivalent so matters little. However when I started running some tests on your sample string, it is indeed 210 characters in length and I can't replicate the issue on a shorter string. x = new Blob(["12345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890"]) Blob {type: "", size: 110, slice: function} – TechnicalChaos Apr 29 '15 at 19:23
  • 1
    My research led me to this post: http://stackoverflow.com/questions/23795034/creating-a-blob-or-a-file-from-javascript-binary-string-changes-the-number-of-by – TechnicalChaos Apr 29 '15 at 19:50
  • Honing in on the solution. This has is in a similar direction but uses forge (which we are using). http://stackoverflow.com/questions/28585353/how-can-i-encrypt-and-decrypt-a-pdf-blob-with-forge-and-store-in-localstorage – Mark Kasson Apr 30 '15 at 00:02

2 Answers2

7

There were three factors that led to the increase in size.

Our first issue was that we were reading the file using FileReader's readAsDataURL. This reads a file and encodes it in base64, which results in a roughly 33% increase in size. We changed to readAsArrayBuffer and read into a Uint8Array (an array of 8 bit bytes).

We are passing the file to encryption system forge.js and that only takes data in as a string, so we had to convert the binary ArrayBuffer to a string. We used the more performant solution here. This reference is more thorough and refers to the relatively new TextEncoder/Decoder APIs. We haven't gotten to using them yet. I'd guess they perform better as they're purely native.

Once forge does the encryption, we have to convert to a Blob, so see this on how to convert ArrayBuffer to and from Blob.

Second, as @TechnicalChaos pointed to, we were using a binary string in javascript. This encoding causes it to be larger in size because strings in javascript are encoded in 2 byte characters.

The blob could then be attached to a form to be uploaded to our PHP server into $_FILE.

Now our uploads are approximately the same size as the files we encrypt.

Brett Zamir
  • 12,481
  • 5
  • 45
  • 68
Mark Kasson
  • 1,492
  • 15
  • 24
1

I had a similar issue with putting binary data into a Javascript blob - turns out Blob was assuming UTF-8 encoding and so some of the raw data bytes ended up as multibyte characters.

The solution was to put each byte of binary data into a Uint8Array and pass that to Blob instead.

tschumann
  • 1,326
  • 2
  • 12
  • 28