1

I have a data structure that I got as a result of the problem stated here.

Code:

df <- tibble::tribble(~person, ~age, ~height,  
                      "John", 1, 20,  
                      "Mike", 3, 50,  
                      "Maria", 3, 52,  
                      "Elena", 6, 90,  
                      "Biden", 9, 120)  
df %>%
  mutate(
    age_c = cut(
      age,
      breaks = c(-Inf, 5, 10),
      labels = c("0-5", "5-10"),
      right = TRUE
    ),
    height_c = cut(
      height,
      breaks = c(-Inf, 50, 100, 200),
      labels = c("0-50", "50-100", "100-200"),
      right = TRUE
    )
  ) %>%
  count(age_c, height_c, .drop = FALSE)

# A tibble: 6 x 3
  age_c height_c     n
  <fct> <fct>    <int>
1 0-5   0-50         2
2 0-5   50-100       1
3 0-5   100-200      0
4 5-10  0-50         0
5 5-10  50-100       1
6 5-10  100-200      1

Now I am trying to create a scatter plot but I have a problem that it seems like the code is not noticing that the values on the X and Y axis are repeating. Instead, it is repeating them. So, I would expect my x-axis to have two values 0-5 and 5-10 (what I get is 0-5,0-5,0-5,5-10,5-10,5-10), and the y-axis three values 0-50, 50-100 and 100-200 (instead I have two series of them).

The code I use to plot:

ggplot(df, aes(x=age_c, y=height_c))

Expected plot (where the size of circles would be based on the value of N):
plot

StupidWolf
  • 34,518
  • 14
  • 22
  • 47
CroatiaHR
  • 505
  • 3
  • 12
  • This cannot be a scatter plot. Your values are factors. How would you want to plot 0-5 and 0-50?? Like what exactly do you mean by plotting? In the XY plane, there is no point known as 0-5, 0-50 – Onyambu Nov 11 '20 at 14:49
  • 0-5 are just margins, I see them more like labels ... 0-5 is category 1, 5-10 is category 2 .. – CroatiaHR Nov 11 '20 at 14:53
  • then that is not a scatter plot. A scatter plot is only used to graph real/continuous values and not categorical values – Onyambu Nov 11 '20 at 14:54
  • I think I am missing something in the logic. I have N value that I expect to be plotted based on the values of AGE and HEIGHT – CroatiaHR Nov 11 '20 at 14:56
  • A scatter plot takes in 2 values. X and Y. And both X and Y must be continuous values(Not categorical). – Onyambu Nov 11 '20 at 14:59
  • Also how do you want to plot a 3rd variable given 2 variables? Are you planning on having a 3D plot? you have 3 variables – Onyambu Nov 11 '20 at 15:00
  • Okay, how would you plot if you want to give an overview of the data I have given in my problem? I want to have a plot with dots, whose size would be based on the N value (which at this point is not important). But my problem is first to get any plot at the moment since, I cant get the x,y axes to first get the needed values – CroatiaHR Nov 11 '20 at 15:04
  • This is some interesting data. As was mentioned in another comment, you can't have a scatterplot with two categorical axis. But you can do `geom_jitter()`, `geom_count()` and some others, too. I don't understand what you want, so I'll leave it at that. – Érico Patto Nov 11 '20 at 15:11
  • I have uploaded an expected plot, that I hope would make my idea of what I try to accomplish clear – CroatiaHR Nov 11 '20 at 15:12
  • probably you should try `ggplot(df, aes(x=age_c, y=height_c, size=n)) + geom_point()` – Onyambu Nov 11 '20 at 15:25

1 Answers1

1

If you plot the count data.frame it should work:

countdf = df %>%
  mutate(
    age_c = cut(
      age,
      breaks = c(-Inf, 5, 10),
      labels = c("0-5", "5-10"),
      right = TRUE
    ),
    height_c = cut(
      height,
      breaks = c(-Inf, 50, 100, 200),
      labels = c("0-50", "50-100", "100-200"),
      right = TRUE
    )
  ) %>%
  count(age_c, height_c, .drop = FALSE)


countdf %>% 
filter(n>0) %>% 
ggplot(aes(x=age_c,y=height_c,size=n)) + 
geom_point() + 
scale_size_continuous(range=c(5,10),breaks=c(1,2))

enter image description here

StupidWolf
  • 34,518
  • 14
  • 22
  • 47