3

I'm somewhat surprised that I can't find a solution to this problem on SO, but I've tried every search term that I think might apply. However, I may not be using the correct search terms so forgive me if this is a duplicate, and please point me in the correct direction. I have data that is grouped by sample and each sample has one value for each category, of which there are many. Here is an example dataframe (note that the number of samples and number of categories are usually different):

df <- data.frame( sample = c( "one", "two", "three", "four" ), 
  cat_1 = c( 2, 4, -6, 2 ), cat_2 = c( 1, 2, 2, 1 ), 
  cat_3 = c( 5, -5, 7, 2 ) ) 

I'm trying to create a plot where the x-axis has discrete points for each category, the y-axis is the value for all samples at each category, and those values for each sample across the categories are connected by lines of a color I can define.

It seems like ggplot2 is the way to go here, but I can't find a way to get this to work out the way I want. It seems like I want colnames( dd ) to be the x-axis variable when using aes() but that warns me that x and y are not the same length. Seems like this should be quite simple to do, but I can't figure it out.

EDIT: I've come across this post Plotting multiple variables from same data frame in ggplot where the answer shows the exact type of plot I want to make, but I can't figure out how to use melt to change the my data frame into a format that puts the column names, cat_1, cat_2, cat_3, as the id.vars.

Jesse
  • 207
  • 2
  • 13

1 Answers1

5

The function melt from the reshape2 package, transforms data to long format. It stacks a set of columns into a single column. You may want to define the id variables, which will remain unchanged after calling the function.

If called without arguments, melt will assume factor and character variables are id variables, and all others are measured. In addition it gives default column names: "variable" and "value". In the result, the old column names are rows under the new column "variable".

library(reshape2)
> melt(df)
Using sample as id variables
   sample variable value
1     one    cat_1     2
2     two    cat_1     4
3   three    cat_1    -6
4    four    cat_1     2
5     one    cat_2     1
6     two    cat_2     2
7   three    cat_2     2
8    four    cat_2     1
9     one    cat_3     5
10    two    cat_3    -5
11  three    cat_3     7
12   four    cat_3     2

For your problem, you could use the following code, specifying the id_vars, and specifying more informative column names (the structure remains the same):

df2 <- melt(df, id_vars = sample, variable.name = "category", value.name = "value")

> df2
   sample category value
1     one    cat_1     2
2     two    cat_1     4
3   three    cat_1    -6
4    four    cat_1     2
5     one    cat_2     1
6     two    cat_2     2
7   three    cat_2     2
8    four    cat_2     1
9     one    cat_3     5
10    two    cat_3    -5
11  three    cat_3     7
12   four    cat_3     2

ggplot(df2, aes( x=category, y=value, group=sample, col=sample)) + 
  geom_line()

Yielding the following plot enter image description here

Please let me know whether this is what you want.

KoenV
  • 3,428
  • 2
  • 16
  • 27
  • Thanks very much, that is exactly what I was after. The explanation of using `melt` is very useful. My problem was in defining what was a variable and what was a category. – Jesse Jul 11 '17 at 14:55
  • My pleasure. I am happy I could help. You can find an excellent explanation about restructuring data: **[here](https://www.r-statistics.com/2012/01/aggregation-and-restructuring-data-from-r-in-action/)**. – KoenV Jul 11 '17 at 15:02