0

I'd like to post the Top10 most present actors/actresses in American productions with an average score above 7.

I've tried several combinations and I've searched on stack but I really can't figure it out

df['actors'] = pd.Series(df['cast'].str.split(',', expand=True).stack().reset_index(drop=True))

top_actors = df[df['country'] == 'United States']

top_actors = df[df.actors != 'No Actors'].set_index('averageRating').actors.str.split(', ', expand=True).stack().reset_index(level=1, drop=True)
plt.figure(figsize=(13,7))
plt.title('Top 10 des acteurs américain')

sns.countplot(y = top_actors, order=top_actors.value_counts().index[:10], palette='Blues')
plt.show()

  • have you tried adapting the list comprehension approach to your problem? ```lang-python top_actors_above_7_rating = [actor for actor in complete_list if actor.rating > 7.0] ``` – glamredhel Apr 18 '21 at 09:14
  • No...I must admit that I am a beginner –  Apr 18 '21 at 09:21
  • Have you any idea, how can I solve my problem? –  Apr 18 '21 at 09:22
  • if you can read a line from the table and access the contents - then you can use the above approach. Question is how exactly you are reading the lines and how they get stored. If they get stored as objects - you will need to access appropriate attributes like ```object.attribute > 7.0```. If they get stored as list elements - you will need ```list[element_number] > 7.0```. You might need to convert to float in case the values like '6.4' get stored as strings instead of floats. – glamredhel Apr 18 '21 at 09:27
  • this looks relevant to your problem: https://stackoverflow.com/questions/11350770/select-by-partial-string-from-a-pandas-dataframe?rq=1 – glamredhel Apr 18 '21 at 09:38

2 Answers2

0

this may help.

    df = pd.read_csv("filename.csv")
    df_ = df.copy()
    for i,r in df_.iterrows():
      if (r['averageRating'] < 7):
        df_.drop(i,inplace=True)

In the above-mentioned code, we are iterating through the rows and check if the 'averageRating' is less than 7, if that condition is true then we are dropping the entire row.

S.R Keshav
  • 862
  • 8
  • 10
0

Someone help me, so we can use a Pivot Table :

us_actors = df[df['country'] == 'United States']

rate_over_7 = us_actors[us_actors['averageRating'] >7]

actors = pd.Series(df['cast'].str.split(',', expand=True).stack().reset_index(drop=True))

table = pd.pivot_table(rate_over_7, values=['averageRating'], index=actors, aggfunc= np.mean)

top_10 = table.iloc[:10]
top_10_sort = top_10.sort_values(by=['averageRating'], ascending=False)

sns.barplot(data=top_10_sort, x="averageRating", y=top_10_sort.index)```