15

What is the convention to deliver a binary resource (like a pdf file) with a REST API? Do you just return a URL to the resource in your JSON or XML response, e.g., {"url" : "http://example.com/document.pdf"} ?

I'm trying to understand the difference between a URI and URL and keep with a RESTful philosophy. Admittedly, this is new to me so I may be misunderstanding some things.

adamkrell
  • 248
  • 1
  • 2
  • 11

3 Answers3

13

This Section Assumes You Mean: How Do I Tell The User Where to Find a Binary Resource

The difference between a URI and a URL doesn't have anything to do with binary vs. non-binary datatypes (see also).

If you're returning mostly JSON, then a url entry is a common way to go. If you're doing something more HTML/XML-ish, then something like a <link> element with a good rel attribute makes a lot of sense.

Obviously, if the client makes a GET request to the direct URL you gave them, then you should send them the file, unless they sent a bunch of content-negotiation headers that effectively preclude you from fulfilling their request. In that case, a 406 Not Acceptable response (or the official definition) makes a lot of sense.

If you meant something else by your question, please clarify.

A Rambling "Do It Like This" Section

First: ignore URL vs. URI. It doesn't have anything to do with this. At all.

Next: If your problem is not "How do I link to a resource" (which might be affected by the stuff I'm about to discuss), but "What if my resource is just a PDF file", you have all sorts of options for addressing it. First off, you need to step back and think more abstractly (a little). Your resource is almost certainly not "a PDF file". It's "a file uploaded by a user", or "a PDF version of a report that I generate", etc.

In the first case, you probably don't have any representation of the resource beyond the binary they sent you, which is totally fine. You probably won't need to perform any sort of content-negotiation when you receive a GET to the URL for that resource. Just send them the file, subject to the caveats about 406 I mention above.

In the second case, you might have all sorts of representations of this resource: CSV, HTML, LaTeX, you name it. In this case, when you receive a GET to the URL for the resource, you do need to do some content-negotiation, so you know whether to send them the PDF document, or something else. It's possible that you might have a JSON representation of the resource that is just the raw data you use to generate the PDF.

In either case, it would be unexpected if you had a representation that was strictly metadata about the resource. If needed (often it is, sometimes it isn't), explicit, external metadata (as opposed to metadata embedded within the binary resource, such as author and title info in PDFs) is most commonly modeled as a separate resource.

Finally, as @monitorjbl says: you probably don't want to embed the binary data directly in a text format such as JSON or XML. There are ways of doing it, often involving the words "base64-encoded", but it's usually not the best approach. In general, you shouldn't mix binary data and text data.

Community
  • 1
  • 1
Hank Gay
  • 65,372
  • 31
  • 148
  • 218
  • That's what I mean, mostly. I could just spit out the pdf when a GET request comes, but that doesn't seem RESTful. I'm assuming that you should only return "representations" of the resource rather than the resource itself (again, I'm probably misunderstanding something here). That's why I'm a little confused about URI and URL. Your link is helpful, but I'm still trying to clarify it. – adamkrell Aug 29 '12 at 20:21
  • @Drinian I updated my answer; hopefully it is more useful, now. – Hank Gay Aug 29 '12 at 21:36
  • Thank you. There is a data representation of the pdf, which is an invoice, but the pdf has unique data (a signature). I assume that this means the best solution is just to send back a URL that points to the pdf? If so, then that URL isn't considered part of the API, correct? – adamkrell Aug 29 '12 at 22:58
  • Or should I have the GET request send a different Accept header depending on whether the user wants the plain data or the actual pdf? – adamkrell Aug 29 '12 at 23:09
  • @Drinian just to make sure I'm clear: you have an invoice that can be viewed in "raw data" form, or in PDF form? I'd lean toward content negotiation, but since the PDF has a signature (a scan of a physical invoice?) it's not open-and-shut in my mind. It might help to talk with the people who will be consuming your service to see what they prefer. – Hank Gay Aug 30 '12 at 02:59
  • @Drinian hope things go well. – Hank Gay Aug 30 '12 at 13:25
6

Binary or not, your REST resources should be described with hypermedia types.

  • if your REST clients PUT/POST resources in msgpack format, the REST server can still read this message and update/create the resource. So why not.
  • if your REST clients PUT/POST resources in PDF format, my guess is you won't be able to extract all the information you need to create/update a resource properly. So, no.

In that last case, you may be dealing with a "Google drive"-like service: the those PDFs aren't your resources per se, and should be linked by your actual resource (i.e. the URL should be within your resource).

Even if Google Drive may not be the perfect REST API (API reference), it's dealing with both JSON resources and actual binary files.

Brian Clozel
  • 46,620
  • 12
  • 129
  • 152
3

In my experience, doing that would antithetical to the idea of a REST webservice. You can never cache this response without serious headache, unlike traditionally RESTful services. Also, since you're going to have to be consuming the service as text in order to read your XML/JSON, you probably won't be able to optimize for both text and binary reads. Not to mention, you would have to always need the binary information, or you'd be taking a pretty significant hit in performance when you only wanted the text data. And if you always need the binary data, maybe ask yourself why you need the webservice at all?

This is not to say it's impossible (there is BSON, after all) or that the use case for this is nonexistent, but you should make very sure that you can't get away with forcing a separate request for the binary data before you attempt to do this. Embedding binary data into a document format designed for text is very inefficient, and your data will be much larger in this form than if it were just raw bytes.

As an aside, if you are always doing this with a vector graphic resource like SVG or certain types of PDFs, you can represent that as XML data. Again, you may not want to, as it will increase your payload, but it's an option to get around the "needing binary" thing.

monitorjbl
  • 4,008
  • 3
  • 30
  • 45