Python : Find the Top 10 between multiple columns

Question

I'd like to post the Top10 most present actors/actresses in American productions with an average score above 7.

I've tried several combinations and I've searched on stack but I really can't figure it out

df['actors'] = pd.Series(df['cast'].str.split(',', expand=True).stack().reset_index(drop=True))

top_actors = df[df['country'] == 'United States']

top_actors = df[df.actors != 'No Actors'].set_index('averageRating').actors.str.split(', ', expand=True).stack().reset_index(level=1, drop=True)
plt.figure(figsize=(13,7))
plt.title('Top 10 des acteurs américain')

sns.countplot(y = top_actors, order=top_actors.value_counts().index[:10], palette='Blues')
plt.show()

have you tried adapting the list comprehension approach to your problem? ```lang-python top_actors_above_7_rating = [actor for actor in complete_list if actor.rating > 7.0] ``` — glamredhel, Apr 18 '21 at 09:14
if you can read a line from the table and access the contents - then you can use the above approach. Question is how exactly you are reading the lines and how they get stored. If they get stored as objects - you will need to access appropriate attributes like ```object.attribute > 7.0```. If they get stored as list elements - you will need ```list[element_number] > 7.0```. You might need to convert to float in case the values like '6.4' get stored as strings instead of floats. — glamredhel, Apr 18 '21 at 09:27
this looks relevant to your problem: https://stackoverflow.com/questions/11350770/select-by-partial-string-from-a-pandas-dataframe?rq=1 — glamredhel, Apr 18 '21 at 09:38

score 0 · Answer 1 · answered Apr 18 '21 at 09:42

0

this may help.

    df = pd.read_csv("filename.csv")
    df_ = df.copy()
    for i,r in df_.iterrows():
      if (r['averageRating'] < 7):
        df_.drop(i,inplace=True)

In the above-mentioned code, we are iterating through the rows and check if the 'averageRating' is less than 7, if that condition is true then we are dropping the entire row.

answered Apr 18 '21 at 09:42

S.R Keshav

862
8
10

1

Indeed, it's a good idea – Apr 18 '21 at 09:58

score 0 · Accepted Answer · answered Apr 18 '21 at 12:55

Someone help me, so we can use a Pivot Table :

us_actors = df[df['country'] == 'United States']

rate_over_7 = us_actors[us_actors['averageRating'] >7]

actors = pd.Series(df['cast'].str.split(',', expand=True).stack().reset_index(drop=True))

table = pd.pivot_table(rate_over_7, values=['averageRating'], index=actors, aggfunc= np.mean)

top_10 = table.iloc[:10]
top_10_sort = top_10.sort_values(by=['averageRating'], ascending=False)

sns.barplot(data=top_10_sort, x="averageRating", y=top_10_sort.index)```

Python : Find the Top 10 between multiple columns

2 Answers2