4

I have a dataframe and I am looking at one column within the dataframe called names

array(['Katherine', 'Robert', 'Anne', nan, 'Susan', 'other'], dtype=object)

I am trying to make a call to tell me how many times each of these unique names shows up in the column, for example if there are 223 instances of Katherine etc. How do i do this? i know value_counts just shows 1 for each of these because they are the separate unique values

kwashington122
  • 555
  • 3
  • 8
  • 16
  • 1
    `value_counts` is what you want. If there is more than one occurrence it should show them. If you think it's not doing that, please show a complete example demonstrating the problem. Note that you need to use `.value_counts()` on your actual column, not on the list of unique values. – BrenBarn Jan 15 '17 at 19:59
  • @BrenBarn there must be a dupe for this question, it has been asked so many times, still searching – EdChum Jan 15 '17 at 20:32

2 Answers2

9

If I understand you correctly, you can use pandas.Series.value_counts.

Example:

import pandas as pd
import numpy as np

s = pd.Series(['Katherine', 'Robert', 'Anne', np.nan, 'Susan', 'other'])

s.value_counts()

Katherine    1
Robert       1
other        1
Anne         1
Susan        1
dtype: int64

The data you provided only has one of each name - so here is an example with multiple 'Katherine' entries:

s = pd.Series(['Katherine','Katherine','Katherine','Katherine', 'Robert', 'Anne', np.nan, 'Susan', 'other'])

s.value_counts()

Katherine    4
Robert       1
other        1
Anne         1
Susan        1
dtype: int64

When applied to your Dataframe you will call this as follows:

df['names'].value_counts()
nipy
  • 3,914
  • 3
  • 17
  • 41
0

You could use group by to achieve that:

df[['col1']].groupby(['col1']).agg(['count'])
Istvan
  • 6,372
  • 7
  • 43
  • 81
  • I don't think this would work. `df[['col1']]` will return a one column DataFrame. If you group the DataFrame on that column `agg` will not be able to find any other columns to aggregate. You can use `.size()` instead of `agg('count')` but I'd go with `value_counts`. – ayhan Jan 15 '17 at 20:03