11

In a previous question, I asked whether whether a convenient wrapper exists inside base R to format numbers as percentages.

This elicited three responses:

  1. Probably not.
  2. Such a wrapper would be too narrow to be useful. It is better that useRs learn how to use existing tools, such as sprintf, which can format numbers in a highly flexible way.
  3. Such a wrapper is problematic, anyway, since you lose the ability to perform calculations on the object.

Still, in my view the sprintf function is just a little bit too obfuscated for the R beginner to learn (except if they come from a C background). Perhaps a better solution is to modify format or prettyNum to have options for adding prefixes and suffixes, so you could easily create percents, currencies, degrees, etc.


Question:

How would you design a function, class or set of functions to elegantly deal with formatting numbers as percentages, currencies, degrees, etc?

Community
  • 1
  • 1
Andrie
  • 163,419
  • 39
  • 422
  • 472

4 Answers4

10

I would probably keep things very simple. format() is generally useful for most basic formatting needs. I would extend that with a simple wrapper that allowed arbitrary prefix and suffix strings. Here is a simple version:

formatVal <- function(x, prefix = "", suffix = "", sep = "", collapse = NULL,
                      ...) {
    x <- format(x, ...)
    x <- paste(prefix, x, suffix, sep = sep, collapse = collapse)
    x
}

If I were doing this for real, I would probably not have the collapse argument in the definition of formatVal(), but instead process it out of ..., but for illustration I kept the above function simple.

Using:

set.seed(1)
m <- runif(5)

some simple examples of usage

> formatVal(m*100, suffix = "%")
[1] "26.55087%" "37.21239%" "57.28534%" "90.82078%" "20.16819%"
> formatVal(m*100, suffix = "%", digits = 2)
[1] "27%" "37%" "57%" "91%" "20%"
> formatVal(m*100, suffix = "%", digits = 2, nsmall = 2)
[1] "26.55%" "37.21%" "57.29%" "90.82%" "20.17%"
> formatVal(m, prefix = "£")
[1] "£0.2655087" "£0.3721239" "£0.5728534" "£0.9082078" "£0.2016819"
> formatVal(m, prefix = "£", digits = 1)
[1] "£0.3" "£0.4" "£0.6" "£0.9" "£0.2"
> formatVal(m, prefix = "£", digits = 1, nsmall = 2)
[1] "£0.27" "£0.37" "£0.57" "£0.91" "£0.20"
Gavin Simpson
  • 157,540
  • 25
  • 364
  • 424
  • I think having `sep==""` is appropriate, probably necessary, since otherwise the default would be `sep="_"` and I can't think of a case where this would be appropriate. Also, when doing it for real, it would probably be sensible to have a separate `sep` for prefix and suffix (prefix generally a space, suffix generally empty string) – Andrie Aug 23 '11 at 07:27
8
print.formatted <- function(x)
{
   print(paste(attr(x,"prefix"), sprintf(x*attr(x,"scaleFactor"),fmt=paste("%.",attr(x,"precision"),"f",sep="")), attr(x,"suffix"), sep=""))
}

as.percent <- function(x,precision=3)
{
  class(x) <- c(class(x),"formatted")
  attr(x,"scaleFactor")<-100
  attr(x,"prefix")<-""
  attr(x,"suffix")<-"%"
  attr(x,"precision")<-precision
  return(x)
}

as.currency <- function(x,prefix="£")
{
  class(x) <- c(class(x),"formatted")
  attr(x,"scaleFactor")<-1
  attr(x,"prefix")<-prefix
  attr(x,"suffix")<-""
  attr(x,"precision")<-2
  return(x)
}

as.percent(runif(3))
[1] "21.585%" "12.396%" "37.744%"

x <- as.currency(rnorm(3,500,100))
x
[1] "£381.93" "£339.49" "£521.74"
2*x
[1] "£763.86"  "£678.98"  "£1043.48"
James
  • 61,307
  • 13
  • 140
  • 186
  • One infelicity is that you are hard-coding the scale factor in `as.percent()`. What if I have numbers that already are % but just want the `"%"` appending? Another issue is that you don't get the formatted strings, they are only ever printed. – Gavin Simpson Aug 22 '11 at 14:07
  • @Gavin Simpson Its better to store the numbers internally as the base numbers so you can do calculations with them. In which case, just use `as.percent(yourPercs/100)`. You can store the printed strings by using `y – James Aug 22 '11 at 14:14
  • I disagree, if the aim is to format numbers, it is cleaner if all the function does *is* format the given input accordingly. Why should I have to divide my perfectly acceptable percentages just to fit in with your idea that percentages will be stored as a 0,1 proportion? ;-) Technically, one could argue your function doesn't meet the brief @Andrie set us because nowhere do you format the numbers *and* return them. `print()` really (in spirit of R) should give a printed representation; your function does that, but it doesn't format the input except at `print()`-time. – Gavin Simpson Aug 22 '11 at 14:21
  • If there was no standard for storing the percentages, then how would you tell if the input is 10% or 1000%? I guess I interpreted the problem differently, but I suppose for a fully flexible method, the user may as well learn how to use `paste`, `sprintf`, `prettyNum`, etc. – James Aug 22 '11 at 14:49
4

I think this could be done through attributes, e.g. let v <- 3.4. If it is pounds Sterling, we could use something like:

attributes(v)<-list(style = "descriptor", type = "currency", category = "pound")

If it is a percentage:

attributes(v)<-list(style = "descriptor", type = "proportion", category = "percentage")

Then, a special print method would be necessary. One could also incorporate a translation method, e.g. to convert from GBP to USD (pounds to dollars), centimeters to inches, etc.

The descriptor is essentially my view on a reserved kind of flag for indicating special handling for the given number. This could later extend to text strings, such as addresses and names. For other numbers, such as phone numbers, there may be special decompositions into country code, intra-country area/regional codes, all the way down to extensions.

Such a package may be akin to ggplot for data types - special methods for storing, transforming, and printing things within types?

Such a system might ensure that dimensions are correct when multiplying values. That has real utility in a lot of applications.

To my knowledge, the only widespread handling of units in R is for bytes (bytes, KB, MB, etc.) and time (hours, seconds, etc.). Even so, the handling, while simple, isn't obvious - I still have to tell print the units to use. For instance, If I want to print an object's size in KB, I can't simply calculate object.size(v)/1024 - the output is reported in fractions of a byte, rather than KB; I have to use print(object.size(v), units = "K").

Iterator
  • 19,577
  • 11
  • 65
  • 109
2

ggplot2 has a bunch of functions for formatting common specific cases. These would be ideal, but for two things: they aren't really general enough, and you shouldn't really have to load ggplot2 (with all it's dependencies) to get at such functions. You could try contacting Hadley to get the signatures changed to pass more things to format, and have them moved to a lower level package (plyr maybe, or their own package, ggtools?).

Richie Cotton
  • 107,354
  • 40
  • 225
  • 343
  • Good tip. Hopefully this is already happening as part of the `ggplot2` rewrite. I know, for example, that the intention is to separate `ggplot2` into multiple packages that can more easily be re-used on their own. – Andrie Aug 22 '11 at 15:27