0

I have a dataframe (top_lang) that has a list of countries (country), the different languages spoken in each country (lang) and the number of people in each country that speak each language (langCountryPop). I have the langCountryPop column in descending order for each country and I would like to extract the largest number for each country.

This is a sample of my data: enter image description here

A sample output I would like is:

x = data.frame("country"= c("American Samoa", "Andorra"), "lang" = c("Samoan", "Catalan"), "langCountryPop" = c(56700, 31000))

but repeated for all the countries in my dataset.

My attempt was:

top_lang %>% select(country, lang, langCountryPop) %>% arrange(country, max(langCountryPop))

But that hasn't outputted just the highest spoken language. Is there a function that will extract the max value within a group/is there another way to do this? Thanks!

Angie
  • 109
  • 8
  • 1
    `top_lang %>% select(country, lang, langCountryPop) %>% group_by(country) %>% filter(langCountryPop==max(langCountryPop))` can help! – Duck Jul 21 '20 at 17:41

1 Answers1

1

Here are two dplyr solution:

Data:

df = data.frame("country"= c("American Samoa", "American Samoa", "American Samoa", "Andorra", "Andorra", "Andorra"), 
               "lang" = c("Samoan", "Japanese", "English", "Catalan", "Spanish", "French"), 
               "langCountryPop" = c(56700, 1500, 1234, 31000, 24600, 2400))

First Solution:

This capitalizes on the fact that, as you say, the langCountryPopvalues are already sorted in decreasing order, so you know that the first value per group is the maximum value. You can subset the dataframe on that value using slice:

library(dplyr)
df %>% 
  group_by(country) %>% 
  slice(1) 

Another solution is by filtering out the maximum langCountryPop value per group:

library(dplyr)
df %>% group_by(country) %>% filter(langCountryPop == max(langCountryPop))

Result of either method:

# A tibble: 2 x 3
# Groups:   country [2]
  country        lang    langCountryPop
  <chr>          <chr>            <dbl>
1 American Samoa Samoan           56700
2 Andorra        Catalan          31000
Chris Ruehlemann
  • 10,258
  • 2
  • 9
  • 18