0

I want to merge two data frames but have the resulting merged data frame have only the "necessary" number of levels in one of its variables. Like this:

df1 <- data.frame(country=c("AA", "BB"))
df2 <- data.frame(country=c("AA", "BB", "CC"), name=c("Country A", "Country B", "Country C"))
df3 <- merge(df1, df2, by="country")

Then:

> df3
  country      name
1      AA Country A
2      BB Country B

which is what I expected.

However, why are there 3 levels for factor 'name' if there are only 2 lines of data?

> str(df3)
'data.frame':   2 obs. of  2 variables:
 $ country: Factor w/ 2 levels "AA","BB": 1 2
 $ name   : Factor w/ 3 levels "Country A","Country B",..: 1 2

How do I get rid of 'Country C' in df3?

> table(df3)
       name
country Country A Country B Country C
     AA         1         0         0
     BB         0         1         0
Paulo S. Abreu
  • 161
  • 3
  • 8

1 Answers1

1

You could try:

table(droplevels(df3))
#         name
#country Country A Country B
# AA         1         0
# BB         0         1

Here the levels of df2$name are not dropped while you do the merge. Another way would be to:

 df3$name <- factor(df3$name)
 table(df3)
 #     name
#country Country A Country B
# AA         1         0
# BB         0         1
akrun
  • 674,427
  • 24
  • 381
  • 486
  • Thanks to this answer I was able to find other references that helped me understand even more the problem, like: http://stackoverflow.com/questions/1195826/dropping-factor-levels-in-a-subsetted-data-frame-in-r?rq=1 – Paulo S. Abreu Sep 19 '14 at 18:52