1436

In HTTP there are two ways to POST data: application/x-www-form-urlencoded and multipart/form-data. I understand that most browsers are only able to upload files if multipart/form-data is used. Is there any additional guidance when to use one of the encoding types in an API context (no browser involved)? This might e.g. be based on:

  • data size
  • existence of non-ASCII characters
  • existence on (unencoded) binary data
  • the need to transfer additional data (like filename)

I basically found no formal guidance on the web regarding the use of the different content-types so far.

max
  • 26,552
  • 10
  • 48
  • 72
  • 90
    It should be mentioned that these are the two MIME types that HTML forms use. HTTP itself has no such limitation... one can use whatever MIME type he wants via HTTP. – tybro0103 Mar 21 '14 at 15:18

6 Answers6

2139

TL;DR

Summary; if you have binary (non-alphanumeric) data (or a significantly sized payload) to transmit, use multipart/form-data. Otherwise, use application/x-www-form-urlencoded.


The MIME types you mention are the two Content-Type headers for HTTP POST requests that user-agents (browsers) must support. The purpose of both of those types of requests is to send a list of name/value pairs to the server. Depending on the type and amount of data being transmitted, one of the methods will be more efficient than the other. To understand why, you have to look at what each is doing under the covers.

For application/x-www-form-urlencoded, the body of the HTTP message sent to the server is essentially one giant query string -- name/value pairs are separated by the ampersand (&), and names are separated from values by the equals symbol (=). An example of this would be: 

MyVariableOne=ValueOne&MyVariableTwo=ValueTwo

According to the specification:

[Reserved and] non-alphanumeric characters are replaced by `%HH', a percent sign and two hexadecimal digits representing the ASCII code of the character

That means that for each non-alphanumeric byte that exists in one of our values, it's going to take three bytes to represent it. For large binary files, tripling the payload is going to be highly inefficient.

That's where multipart/form-data comes in. With this method of transmitting name/value pairs, each pair is represented as a "part" in a MIME message (as described by other answers). Parts are separated by a particular string boundary (chosen specifically so that this boundary string does not occur in any of the "value" payloads). Each part has its own set of MIME headers like Content-Type, and particularly Content-Disposition, which can give each part its "name." The value piece of each name/value pair is the payload of each part of the MIME message. The MIME spec gives us more options when representing the value payload -- we can choose a more efficient encoding of binary data to save bandwidth (e.g. base 64 or even raw binary).

Why not use multipart/form-data all the time? For short alphanumeric values (like most web forms), the overhead of adding all of the MIME headers is going to significantly outweigh any savings from more efficient binary encoding.

Roel Van de Paar
  • 1,439
  • 16
  • 28
Matt Bridges
  • 44,295
  • 7
  • 44
  • 59
  • 87
    Does x-www-form-urlencoded have a length limit, or is it unlimited? – Pacerier Mar 09 '13 at 16:56
  • Base64 might be more efficient than URL-encoded, but straight binary is the most efficient. – ZiggyTheHamster Apr 18 '13 at 23:20
  • 39
    @Pacerier The limit is enforced by the server receiving the POST request. See this thread for more discussion: http://stackoverflow.com/questions/2364840/what-is-the-size-limit-of-a-post-request – Matt Bridges May 28 '13 at 13:23
  • 5
    @ZiggyTheHamster JSON and BSON are each more efficient for different types of data. Base64 is inferior to gzip, for both serialization methods. Base64 does not bring any advantages at all, HTTP supports binary pyloads. – Tiberiu-Ionuț Stan Jun 04 '13 at 17:28
  • 1
    @Tiberiu-IonuțStan ziggy's comment was is in response to my original answer which didn't mention raw binary. – Matt Bridges Jun 05 '13 at 12:01
  • 19
    Also note that if a form contains a named file upload, your only choice is form-data, because urlencoded doesn't have a way to place the filename (in form-data it's the name parameter to content-disposition). – Guido van Rossum Mar 06 '14 at 18:00
  • Should your answer be updated? I heard that `x-` formed mimes types are discouraged to use. – Fractaliste Mar 17 '14 at 09:27
  • 1
    @Fractaliste not really -- it's a part of the HTML401 spec explicitly, and the HTML5 documentation still uses the x- mimetype. http://www.w3.org/TR/html5/forms.html#dom-fs-enctype – Matt Bridges Mar 19 '14 at 14:37
  • 1
    This answer is incomplete and could potentially cause you trouble - see my answer below for the reason. – EML Apr 18 '14 at 11:10
  • 4
    @EML see my parenthetical "(chosen specifically so that this boundary string does not occur in any of the "value" payloads)" – Matt Bridges Apr 18 '14 at 12:23
  • 1
    @MB: sure, I appreciate that, but your answer focusses only on payload size, and you suggest 'even raw binary' encoding, which won't do the job. You have to select the boundary and the encoding in conjunction with each other, which is the difficult part of this problem. Another issue I didn't mention below is that writing a parser for `form-data` is much harder than writing a decoder for URL-encoding, which may be relevant in a generalised API context (which is the case for me, at least). – EML Apr 18 '14 at 17:32
  • You should also mention that the default is "application/x-www-form-urlencoded" http://www.w3.org/TR/html401/interact/forms.html#adef-enctype – Roland May 17 '15 at 17:35
  • @Roland This question is specifically about an API context without a browser (i.e. HTTP, not necessarily HTML) – Matt Bridges May 19 '15 at 12:10
  • Using multipart you can send a part that is "form-urlencoded" too :P – SparK Sep 18 '17 at 18:03
  • Most user agents ("browsers") do not support uploading files unless the `multipart/form-data` is used. The maximum overhead in bytes per `n`n byte long parameter for `multipart/form-data` is around `2*n + 2` (percent encoding plus "&" plus `=`) whereas overhead for `multipart/form-data` is around `50`. – Mikko Rantalainen Sep 25 '18 at 06:44
167

READ AT LEAST THE FIRST PARA HERE!

I know this is 3 years too late, but Matt's (accepted) answer is incomplete and will eventually get you into trouble. The key here is that, if you choose to use multipart/form-data, the boundary must not appear in the file data that the server eventually receives.

This is not a problem for application/x-www-form-urlencoded, because there is no boundary. x-www-form-urlencoded can also always handle binary data, by the simple expedient of turning one arbitrary byte into three 7BIT bytes. Inefficient, but it works (and note that the comment about not being able to send filenames as well as binary data is incorrect; you just send it as another key/value pair).

The problem with multipart/form-data is that the boundary separator must not be present in the file data (see RFC 2388; section 5.2 also includes a rather lame excuse for not having a proper aggregate MIME type that avoids this problem).

So, at first sight, multipart/form-data is of no value whatsoever in any file upload, binary or otherwise. If you don't choose your boundary correctly, then you will eventually have a problem, whether you're sending plain text or raw binary - the server will find a boundary in the wrong place, and your file will be truncated, or the POST will fail.

The key is to choose an encoding and a boundary such that your selected boundary characters cannot appear in the encoded output. One simple solution is to use base64 (do not use raw binary). In base64 3 arbitrary bytes are encoded into four 7-bit characters, where the output character set is [A-Za-z0-9+/=] (i.e. alphanumerics, '+', '/' or '='). = is a special case, and may only appear at the end of the encoded output, as a single = or a double ==. Now, choose your boundary as a 7-bit ASCII string which cannot appear in base64 output. Many choices you see on the net fail this test - the MDN forms docs, for example, use "blob" as a boundary when sending binary data - not good. However, something like "!blob!" will never appear in base64 output.

Cristian Ciupitu
  • 18,164
  • 7
  • 46
  • 70
EML
  • 8,369
  • 6
  • 37
  • 72
  • 54
    While a consideration of multipart/form-data is the ensure the boundary does not appear in the data this is fairly simple to accomplish by choosing a boundary which is sufficiently long. Please do not us base64 encoding to accomplish this. A boundary which is randomly generated and the same length as a UUID should be sufficient: http://stackoverflow.com/questions/1705008/simple-proof-that-guid-is-not-unique. – Joshcodes Apr 30 '14 at 22:01
  • 2
    Better late than never. – devinbost Sep 24 '14 at 14:19
  • 1
    this does clear one's concept when combined with Matt's answer..apt! kudos ! – MrPandav Dec 06 '14 at 12:10
  • 23
    @EML, This doesn't make sense at all. Obviously the boundary is chosen automatically by the http client (browser) and the client will be smart enough not to use a boundary that clashes with the contents of your uploaded files. It's as simple a a substring match `index === -1`. – Pacerier Dec 11 '14 at 08:01
  • 13
    @Pacerier: (A) read the question: "no browser involved, API context". (B) browsers don't construct requests for you anyway. You do it yourself, manually. There's no magic in browsers. – EML Dec 11 '14 at 09:40
  • @EML, Ok, then we do it manually. is it so hard to do `while(true){r = rand(); if(data.indexOf(r) === -1){doStuff();break;}}` – Pacerier Dec 11 '14 at 09:51
  • 2
    @Pacerier: what language is this? Do you mean `Math.random()`? Look - if you want to use random numbers, fine. Generate a random 6-byte sequence, use it as your separator, and don't bother to check it, as Joshcodes suggests. As long as it's random, and not some arbitrary text sequence that you've just typed in, you'll be Ok. On the other hand, if you don't like statistical programming, and you can take the 17% overhead of base64, then just use a guaranteed separator such as `!blob!`. I'm not telling you how to do it - most people would use a random byte sequence in practice. Your call. – EML Dec 11 '14 at 10:54
  • 1
    @EML, No base64 is not needed. Random does fine as my above code shows. You [stated that this is a hard problem](http://stackoverflow.com/questions/4007969/application-x-www-form-urlencoded-or-multipart-form-data/23152871#comment35414353_4073451), so I'm asking you is it so hard to do `while(true){r = rand(); if(data.indexOf(r) === -1){doStuff();break;}}`? – Pacerier Dec 12 '14 at 02:45
  • 1
    @Pacerier: you're not reading anything before replying to it. I didn't state this was a hard problem. I stated that writing a parser for form-data was harder than something else. You started off by incorrectly stating that the browser automatically chooses a boundary, and have moved on to trying to get me to engage in a pointless argument. If you want to post an answer, just post it. – EML Dec 12 '14 at 22:07
  • 2
    @EML, Read your first paragraph again. You stated it "will eventually get you into trouble". The key here is that we will not get into trouble. I'm correctly pointing out that your above claim is wrong yet you still insist that we'll get into trouble. 2) I correctly stated that the browser automatically chooses a boundary, verify it yourself with a websniffer. 3) No one stated that you had stated this was a hard problem. I've asked a question and *still* has not gotten a reply: Is it so hard to do `while(true){r = rand(); if(data.indexOf(r) === -1){doStuff();break;}}`? – Pacerier Feb 05 '15 at 10:09
  • 3
    Regarding the `!blob!`: RFC 1341 says `!` is not an allowed character in the boundary. Only `0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'()+,-./:=?` can be used – BeniBela Apr 01 '15 at 12:33
  • 12
    @BeniBela, He's probably going to suggest to use `'()+-./:=` then. Yet random generation with substring check is still the way to go and it can be done with one line: `while(true){r = rand(); if(data.indexOf(r) === -1){doStuff();break;}}`. EML's suggestion (convert to base64 just to avoid matching substrings) is just plain odd, not to mention it comes with unneeded performance degradation. And all the trouble for nothing since the one line algorithm is equally straightforward and simple. Base64 is not meant to be (ab)used this way, as HTTP body [accept all 8-bit](http://goo.gl/L94Qcm) octets. – Pacerier Jul 27 '15 at 10:51
  • 36
    This answer not only adds nothing to the discussion, but also gives wrong advice. Firstly, whenever transmitting random data in separated parts, it is always possible that the chosen boundary will be present in the payload. The ONLY way to make sure this doesn't happen is to examine the entire payload for each boundary we come up with. Completely impractical. We just accept the _infinitesimal_ probability of a collision and come up with a reasonable boundary, like "---boundary--boundary---". Secondly, always using Base64 will waste bandwidth and fill up buffers for no reason at all. – vagelis May 05 '16 at 12:22
  • 3
    For the same reason that Avro data files are safe using long-enough sync markers (http://apache-avro.679487.n3.nabble.com/Synchronization-Markers-td4026016.html), the vanishingly-small chance of a data collision with 128 bits of truly random data is nothing to worry about, relative to the risk of your data center being consumed by catastrophe. – Wheezil Oct 03 '16 at 21:14
  • 1
    One can simply use UUID v4 as the boundary and as the data is encoded, retry from the start if a identical string is found in the data during encoding. Unless the data to be transmitted is longer than `36*2^63` bytes long even a malicious attacker cannot force multiple retries to be needed for the encoding part. And considering that the data and and the boundary is selected by the same process, there's very little point trying to attack this part. (And if you know that you're sending zebibytes worth of UUID strings, you can probably figure out suitable alternative boundary string.) – Mikko Rantalainen Sep 25 '18 at 07:00
  • 1
    Please don't use `rand()` to generate yourself a boundary in PHP. By default this will generate you an integer in the range `[0, getrandmax()]` which on some platforms can be as small as `[0, 32767]`. An attacker could very trivially upload a file with all possible boundaries generated in this manner and blow up your application. I agree that this mechanism is silly and its persistence inexcusable, but please just use a GUID or something else that's agreed to be so strong an accidental collision is impossibly unlikely. – Wug Apr 29 '19 at 19:18
99

I don't think HTTP is limited to POST in multipart or x-www-form-urlencoded. The Content-Type Header is orthogonal to the HTTP POST method (you can fill MIME type which suits you). This is also the case for typical HTML representation based webapps (e.g. json payload became very popular for transmitting payload for ajax requests).

Regarding Restful API over HTTP the most popular content-types I came in touch with are application/xml and application/json.

application/xml:

  • data-size: XML very verbose, but usually not an issue when using compression and thinking that the write access case (e.g. through POST or PUT) is much more rare as read-access (in many cases it is <3% of all traffic). Rarely there where cases where I had to optimize the write performance
  • existence of non-ascii chars: you can use utf-8 as encoding in XML
  • existence of binary data: would need to use base64 encoding
  • filename data: you can encapsulate this inside field in XML

application/json

  • data-size: more compact less that XML, still text, but you can compress
  • non-ascii chars: json is utf-8
  • binary data: base64 (also see json-binary-question)
  • filename data: encapsulate as own field-section inside json

binary data as own resource

I would try to represent binary data as own asset/resource. It adds another call but decouples stuff better. Example images:

POST /images
Content-type: multipart/mixed; boundary="xxxx" 
... multipart data

201 Created
Location: http://imageserver.org/../foo.jpg  

In later resources you could simply inline the binary resource as link:

<main-resource&gt
 ...
 <link href="http://imageserver.org/../foo.jpg"/>
</main-resource>
Community
  • 1
  • 1
manuel aldana
  • 13,184
  • 8
  • 40
  • 49
  • Interesting. But when to use application/x-www-form-urlencoded and when multipart/form-data? – max Oct 25 '10 at 06:02
  • 4
    application/x-www-form-urlencoded is the default mime-type of your request (see also http://www.w3.org/TR/html401/interact/forms.html#h-17.13.4). I use it for "normal" webforms. For API I use application/xml|json. multipart/form-data is a bell in thinking of attachements (inside response body several data-sections are concattenated with a defined boundary string). – manuel aldana Oct 25 '10 at 19:38
  • 5
    I think the OP was probably just asking about the two types that HTML forms use, but I'm glad this was pointed out. – tybro0103 Mar 21 '14 at 15:08
  • Did you ever try if browsers can submit form-fields e.g. with Json-Mime-type ? – Radon8472 Oct 25 '20 at 09:17
29

I agree with much that Manuel has said. In fact, his comments refer to this url...

http://www.w3.org/TR/html401/interact/forms.html#h-17.13.4

... which states:

The content type "application/x-www-form-urlencoded" is inefficient for sending large quantities of binary data or text containing non-ASCII characters. The content type "multipart/form-data" should be used for submitting forms that contain files, non-ASCII data, and binary data.

However, for me it would come down to tool/framework support.

  • What tools and frameworks do you expect your API users to be building their apps with?
  • Do they have frameworks or components they can use that favour one method over the other?

If you get a clear idea of your users, and how they'll make use of your API, then that will help you decide. If you make the upload of files hard for your API users then they'll move away, of you'll spend a lot of time on supporting them.

Secondary to this would be the tool support YOU have for writing your API and how easy it is for your to accommodate one upload mechanism over the other.

Joe Shaw
  • 20,211
  • 16
  • 63
  • 87
Martin Peck
  • 11,172
  • 1
  • 38
  • 66
  • 1
    Hi, does it mean that every time we post somethings to web server, we have to mention what is the Content-type in order to let web server know should it decode the data? Even we craft the http request ourself, we MUST mention the Content-type right? – Sam YC Jul 17 '13 at 06:21
  • 2
    @GMsoF, It's optional. See http://stackoverflow.com/a/16693884/632951 . You may want to avoid using content-type when crafting a specific request for a specific server to avoid generic overheads. – Pacerier Dec 11 '14 at 08:10
2

Just a little hint from my side for uploading HTML5 canvas image data:

I am working on a project for a print-shop and had some problems due to uploading images to the server that came from an HTML5 canvas element. I was struggling for at least an hour and I did not get it to save the image correctly on my server.

Once I set the contentType option of my jQuery ajax call to application/x-www-form-urlencoded everything went the right way and the base64-encoded data was interpreted correctly and successfully saved as an image.


Maybe that helps someone!

BiJ
  • 1,313
  • 3
  • 18
  • 46
Torsten Barthel
  • 2,022
  • 19
  • 19
  • 4
    What content type was it sending it before you changed it? This problem could have been due to the server not supporting the content type you were sending it as. – catorda Jan 13 '16 at 17:24
1

If you need to use Content-Type=x-www-urlencoded-form then DO NOT use FormDataCollection as parameter: In asp.net Core 2+ FormDataCollection has no default constructors which is required by Formatters. Use IFormCollection instead:

 public IActionResult Search([FromForm]IFormCollection type)
    {
        return Ok();
    }
jahansha
  • 146
  • 1
  • 3