7

Before any of you run at the closing vote let me say that I understand that this question may be subjective, and the expected answer may begin by "it depends". Nevertheless, it is an actually relevant problem I run into, as I am creating more and more graphs, and I don't necessarily know the exact way I am going to use them, or don't have the time to test for the final use case immediately.

So I am leveraging the experience of SO R users to get good reasons to choose one over the other, between jpg(), bmp(), png(), tiff(), pdf() and possibly with which options. I don't have the experience in R and the knowledge in the different formats to choose wisely.

Potential use cases:

  • quick look after or during run time of algorithms
  • presentations (.ppt mainly)
  • reports (word or latex)
  • publication (internet)
  • storage (without too much loss and to transform it later for a specific use)
  • anything relevant I forgot

Thanks! I'm happy to make the question clearer.

Antoine Lizée
  • 3,433
  • 1
  • 24
  • 32
  • 1
    The question is probably still a bit subjective, but I do get your point. Are you trying to get at what is essentially the 'most versatile' format that covers your use cases? If so, I would suggest that `?svg` is also worth including in consideration. Any raster image (jpg/png) just loses too much info. Also, is there any particular reason why you can't just store the code to generate said graphs on demand? – thelatemail Sep 06 '13 at 01:51
  • Relevant: http://stackoverflow.com/questions/18037102/matlab-fig-equivalent-in-r – Andy Clifton Sep 06 '13 at 03:35
  • 1
    I'm still hoping for the day when everyone switches to `.svg` :-( . But I have to agree with the close-votes: this is a very broad question and competes with "emacs vs. vim" for most-often-flamefested. – Carl Witthoft Sep 06 '13 at 11:47
  • @CarlWitthoft and @ thelatemail I didn't know about svg(), and should have included it in my list. – Antoine Lizée Sep 06 '13 at 18:51

3 Answers3

11

To expand a little on my comment, there is no real easy answer, but my suggestions:

  1. My first totally flexible choice would be to simply store the final raw data used in the plot(s) and a bit of R code for generating the plot(s). That way you could easily enough send the output to whatever device that suits your particular purpose. It would not be that arduous a task to set yourself up a couple of basic templates based on png()/pdf() that you could call upon.

  2. Use the svg() device. As noted by @gung, storing the output using pdf() , svg() , cairo_ps() or cairo_pdf() are your only real options for retaining scalable, vector images. I would tend to lean towards svg() rather than pdf() due to the greater editing options available using programs like Inkscape. It is also becoming a quite widely supported format for internet publication (see - http://caniuse.com/svg )

  3. If on the other hand you're a latex user, most headaches seem to be solved by going straight to pdf() - you can usually import and convert pdf files using Inkscape or command line utilities like Imagemagick if you have to format shift.

  4. For Word/Powerpoint interaction, if you are running R on Windows, you can also export directly using win.metafile() which will give you scalable/component emf images which you can import into Word or Powerpoint directly. I have heard of people running R through Wine or using intermediary steps on Linux to get emf files out for later use. For Mac, there are roundabout pathways as well.

So, to summarise, in order of preference.

  1. Don't store images at all, store code to generate images
  2. Use svg/pdf and convert formats as required.
  3. Use a backup win.metafile export directly for those cases where you can't escape using Word/Powerpoint and are primarily going to be based on Windows systems.
thelatemail
  • 81,120
  • 12
  • 111
  • 172
  • Thanks a lot for this answer. I forgot to precise, but one of the current application of outputting such graphics is during algorithm runtime. I **cannot** relaunch the one-day algo just to recreate the plots nor cannot I store the parameters to replot these (or I could, but don't want to write the extra code to add this feature to my program). So the bottom line is that this question is especially interesting when your **1.** do not apply. Your answer and Manetheran's together gives the information needed to do the right choice. Thanks. – Antoine Lizée Sep 06 '13 at 18:48
  • On linux systems, what would you recommend for using afterwords in ppt or word? It seems that pdf is okay? – Antoine Lizée Sep 06 '13 at 18:55
  • @AntoineLizée - pdf/svg are probably your best bets still - you can easily enough generate a high resolution raster or wmf file with either using Inkscape or a similar tool. – thelatemail Sep 06 '13 at 23:59
5

So far the answers for this question have all recommended outputting plots in vector based formats. This will give you the best output, allowing you to resize your image as you need for whatever medium your image will end up in (whether that be a webpage, document, or presentation), but this comes at a computational cost.

For my own work, I often find it is much more convenient to save my plots in a raster format of sufficient resolution. You probably want to do this whenever your data takes a non-trivial amount of time to plot.

Some examples of where I find a raster format is more convenient:

  1. Manhattan plots: A plot showing p-value significance for hundreds of thousands-millions of DNA markers across a genome.
  2. Large Heatmaps: Clustering the top 5000 differentially expressed genes between two groups of people, one with a disease, and one healthy.
  3. Network Rendering: When drawing a large number of nodes connected to each other by edges, redrawing the edges (as vectors) can slow down your computer.

Ultimately it comes down to a trade-off in your own sanity. What annoys you more? your computer grinding to a halt trying to redraw an image? or figuring out the exact dimensions to render an image in raster format so it doesn't look awful for your final publishing medium?

Scott Ritchie
  • 9,228
  • 2
  • 24
  • 61
  • This doesn't really apply to me as my graphs are complex but not really rich in elements, but this is a really interesting point and a nice addition to the answers. Thanks! – Antoine Lizée Sep 06 '13 at 18:50
  • Any preference between all the rasters? time-wise/size-wise/later-usability-wise? – Antoine Lizée Sep 06 '13 at 19:06
  • 2
    Use `png`. Unless you have a very good reason not to. It's far superior to the other raster formats in both file-size and later re-use. See a discussion on image formats here: http://stackoverflow.com/questions/2336522/png-vs-gif-vs-jpeg-when-best-to-use – Scott Ritchie Sep 06 '13 at 22:25
1

The most basic distinction to bear in mind here is raster graphics versus vector graphics. In general, vector graphics will preserve options for you later. Of the options you listed, jpeg, bmp, tiff, and png are raster formats; only pdf will give you vector graphics. Thus, that is probably the best default of your listed options.

gung - Reinstate Monica
  • 10,603
  • 7
  • 53
  • 74