469

I want to ask a question about the multipart/form-data. In the HTTP header, I find that the Content-Type: multipart/form-data; boundary=???.

Is the ??? free to be defined by the user? Or is it generated from the HTML? Is it possible for me to define the ??? = abcdefg?

Yu Xiao
  • 35
  • 6
Questions
  • 17,755
  • 29
  • 68
  • 100
  • 2
    I found this is the answer. http://www.w3.org/TR/html401/interact/forms.html#h-17.13.4.2 – Questions Aug 19 '10 at 07:26
  • Related Q&A: [What if the form-data boundary is contained in the attached file?](https://stackoverflow.com/q/29539498/2718186) – MicroVirus Jun 17 '19 at 12:21
  • Does the boundary get uploaded to the server along with whatever data was posted, so the server automatically uses boundary string specified instead of the default "&" to separate the different values submitted? – ScotterMonkey Nov 24 '20 at 20:40

3 Answers3

476

Is the ??? free to be defined by the user?

Yes.

or is it supplied by the HTML?

No. HTML has nothing to do with that. Read below.

Is it possible for me to define the ??? as abcdefg?

Yes.

If you want to send the following data to the web server:

name = John
age = 12

using application/x-www-form-urlencoded would be like this:

name=John&age=12

As you can see, the server knows that parameters are separated by an ampersand &. If & is required for a parameter value then it must be encoded.

So how does the server know where a parameter value starts and ends when it receives an HTTP request using multipart/form-data?

Using the boundary, similar to &.

For example:

--XXX
Content-Disposition: form-data; name="name"

John
--XXX
Content-Disposition: form-data; name="age"

12
--XXX--

In that case, the boundary value is XXX. You specify it in the Content-Type header so that the server knows how to split the data it receives.

So you need to:

  • Use a value that won't appear in the HTTP data sent to the server.

  • Be consistent and use the same value everywhere in the request message.

Borodin
  • 123,915
  • 9
  • 66
  • 138
Oscar Mederos
  • 26,873
  • 20
  • 76
  • 120
  • 66
    Yout have to add an extra "--" in the end of boundary. – Sebastian Piskorski Jan 15 '14 at 10:56
  • 14
    You can read it in documentation. Boundary ending have to have extra two hypens "--" Link: http://www.w3.org/TR/html401/interact/forms.html#h-17.13.4.2 – Sebastian Piskorski Jan 15 '14 at 21:38
  • 7
    Great answer. A boundary is just the 'key' to separate the multiple "parts" of a multipart payload. Normally something like '&' is enough to separate the variables but you need something more unique to separate the payloads within the payload. – user2483724 Mar 18 '14 at 18:30
  • Note: Content-Length should be changed whene the boundary change – K3rnel31 Apr 16 '14 at 10:24
  • 1
    @K3rnel31 Of course, unless the new boundary string has the same length. – Oscar Mederos Apr 16 '14 at 19:00
  • 5
    I think that the boundary value as declared in the Content-Type header will actually be -XXX--- because an extra "--" should be written when separating the parts (hence the ---XXX---) – Theodore K. Mar 17 '15 at 12:07
  • 2
    Would be clearer if you didn't include any dashes in the boundary, since that would clearly show which dashes are included because of the protocol. – Chet Feb 23 '16 at 16:05
  • Note, however, that if you're using the FormData object from JavaScript, it will set its own boundary irrespective of the value in the Content-Type header. – e18r Aug 10 '16 at 22:56
  • @e18r What should we do with it though? I'm facing the error too but couldn't get over it – Loi Nguyen Huynh Apr 05 '21 at 19:22
  • @LoiNguyenHuynh I don't remember what I did. That comment is 5 years old lol sorry – e18r Apr 09 '21 at 20:43
  • 1
    @e18r My answer was "don't set the header yourself" – Loi Nguyen Huynh Apr 11 '21 at 11:54
119

The exact answer to the question is: yes, you can use an arbitrary value for the boundary parameter, given it does not exceed 70 bytes in length and consists only of 7-bit US-ASCII (printable) characters.

If you are using one of multipart/* content types, you are actually required to specify the boundary parameter in the Content-Type header, otherwise the server (in the case of an HTTP request) will not be able to parse the payload.

You probably also want to set the charset parameter to UTF-8 in your Content-Type header, unless you can be absolutely sure that only US-ASCII charset will be used in the payload data.

A few relevant excerpts from the RFC2046:

  • 4.1.2. Charset Parameter:

    Unlike some other parameter values, the values of the charset parameter are NOT case sensitive. The default character set, which must be assumed in the absence of a charset parameter, is US-ASCII.

  • 5.1. Multipart Media Type

    As stated in the definition of the Content-Transfer-Encoding field [RFC 2045], no encoding other than "7bit", "8bit", or "binary" is permitted for entities of type "multipart". The "multipart" boundary delimiters and header fields are always represented as 7bit US-ASCII in any case (though the header fields may encode non-US-ASCII header text as per RFC 2047) and data within the body parts can be encoded on a part-by-part basis, with Content-Transfer-Encoding fields for each appropriate body part.

    The Content-Type field for multipart entities requires one parameter, "boundary". The boundary delimiter line is then defined as a line consisting entirely of two hyphen characters ("-", decimal value 45) followed by the boundary parameter value from the Content-Type header field, optional linear whitespace, and a terminating CRLF.

    Boundary delimiters must not appear within the encapsulated material, and must be no longer than 70 characters, not counting the two leading hyphens.

    The boundary delimiter line following the last body part is a distinguished delimiter that indicates that no further body parts will follow. Such a delimiter line is identical to the previous delimiter lines, with the addition of two more hyphens after the boundary parameter value.

Here is an example using an arbitrary boundary:

Content-Type: multipart/form-data; charset=utf-8; boundary="another cool boundary"

--another cool boundary
Content-Disposition: form-data; name="foo"

bar
--another cool boundary
Content-Disposition: form-data; name="baz"

quux
--another cool boundary--
antichris
  • 2,015
  • 1
  • 17
  • 17
  • 2
    I like this answer most because it quotes from RFC about how **hyphens** are specified. – Rick Mar 15 '20 at 13:51
  • @Rick There's a valid reason for IETF to do that — although they all look pretty much the same, only one of the following four is the correct hyphen character: ˗ ‐ - ‑ – antichris Mar 16 '20 at 19:32
  • ha, when I said hypens, I mean your answer told me which hypens are defined in the standard. I was confused about which hypens are "client defined" and which are "specification defined" – Rick Mar 17 '20 at 04:40
50

multipart/form-data contains boundary to separate name/value pairs. The boundary acts like a marker of each chunk of name/value pairs passed when a form gets submitted. The boundary is automatically added to a content-type of a request header.

The form with enctype="multipart/form-data" attribute will have a request header Content-Type : multipart/form-data; boundary --- WebKit193844043-h (browser generated vaue).

The payload passed looks something like this:

Content-Type: multipart/form-data; boundary=---WebKitFormBoundary7MA4YWxkTrZu0gW

    -----WebKitFormBoundary7MA4YWxkTrZu0gW
    Content-Disposition: form-data; name=”file”; filename=”captcha”
    Content-Type:

    -----WebKitFormBoundary7MA4YWxkTrZu0gW
    Content-Disposition: form-data; name=”action”

    submit
    -----WebKitFormBoundary7MA4YWxkTrZu0gW--

On the webservice side, it's consumed in @Consumes("multipart/form-data") form.

Beware, when testing your webservice using chrome postman, you need to check the form data option(radio button) and File menu from the dropdown box to send attachment. Explicit provision of content-type as multipart/form-data throws an error. Because boundary is missing as it overrides the curl request of post man to server with content-type by appending the boundary which works fine.

See RFC1341 sec7.2 The Multipart Content-Type

Victor Zamanian
  • 2,861
  • 21
  • 28
Yergalem
  • 1,255
  • 13
  • 13