2

I have a Pandas dataframe that I need to sort by the data columns' maximum values. I am having trouble performing the sort because all of the sorting examples that I have found operate on all of the columns in the dataframe when performing the sort. In this case I need to sort only a subset of the columns. The first column contains a date, and the remaining 90 columns contain data. The 90 data columns are currently sorted alphabetically by their column name. I would like to sort them in decreasing order of their maximum value, which happens to be in the last row.

In the bigger scheme of things, this question is about how to perform sorting on a range of columns within a dataframe, rather than sorting all of the columns in the dataframe. There may be cases, for example, where I need to sort only columns 2 through 12 of a dataframe, while leaving the remaining columns in their existing order.

Here is a sample of the unsorted dataframe:

df.tail()
         Date  ADAMS  ALLEN  BARTHOLOMEW  BENTON  BLACKFORD  BOONE  BROWN   ...     WABASH  WARREN  WARRICK  WASHINGTON  WAYNE  WELLS  WHITE  WHITLEY
65 2020-05-10      8    828          356      13         14    227     28   ...         64      12      123          48     53     11    149       22
66 2020-05-11      8    860          367      16         14    235     28   ...         67      12      126          48     56     12    161       23
67 2020-05-12      8    872          371      17         14    235     28   ...         67      12      131          49     56     12    162       23
68 2020-05-13      9    897          382      17         14    249     29   ...         68      12      140          50     58     13    164       27
69 2020-05-14      9    955          394      21         14    252     29   ...         69      12      145          50     60     15    164       28

I would like to perform the sort so that the column with the largest value in row 69 is placed after df['Date'], with the columns ordered so that the values in row 69 decrease from left to right. Once that is done, I'd like to create a series containing the column headers, to generate rank list. Using the visible columns as an example, the desired list would be:

rank_list=[ "ALLEN", "BARTHOLOMEW", "BOONE", "WHITE", "WARRICK", ... "BLACKFORD", "WARREN", "ADAMS" ]

My biggest hurdle at present is that when I perform the sort I'm not able to exclude the Date column, and I'm receiving a type error:

TypeError: Cannot compare type 'Timestamp' with type 'int'

I am new to Pandas so I apologize if there is a solution to this problem that should be obvious. thanks.

Bob
  • 21
  • 3

2 Answers2

0

You can do it this way using sort_values once selected the right row and the range of column

#data sample
np.random.seed(86)
df = pd.DataFrame({'date':pd.date_range('2020-05-15', periods=5),
                   'a': np.random.randint(0,50, 5), 
                   'b': np.random.randint(0,50, 5), 
                   'c': np.random.randint(0,50, 5), 
                   'd': np.random.randint(0,50, 5)})

# parameters
start_idx = 1 #note: the indexing start at 0, so 1 is the second column
end_idx = df.shape[1] #for the last column
row_position = df.shape[0]-1 #for the last one

# create the new order
new_col_roder = df.columns.tolist()
new_col_roder[start_idx:end_idx] = df.iloc[row_position, start_idx:end_idx]\
                                     .sort_values(ascending=False).index
#reirder
df = df[new_col_roder]

print(df)
        date   c   a   d   b
0 2020-05-15  30  20  44  40
1 2020-05-16  45  32  29   9
2 2020-05-17  17  44  14  27
3 2020-05-18  13  28   4  41
4 2020-05-19  41  35  14  12 #as you can see, the columns are now c, a, d, b
Ben.T
  • 22,613
  • 6
  • 22
  • 44
  • Thanks. This method was straightforward, easy to implement, and produced the desired result. – Bob Jun 24 '20 at 05:59
  • @Bob glad it works, please consider to [accept/upvote](https://stackoverflow.com/help/someone-answers) the answer :) – Ben.T Jun 24 '20 at 12:53
0

I suggest the following:

# initialize the provided sample data frame
df = pd.DataFrame([['65 2020-05-10', 8, 828, 356, 13, 14, 227, 28, 64, 12, 123, 48, 53, 11, 149, 22],
                   ['66 2020-05-11', 8, 860, 367, 16, 14, 235, 28, 67, 12, 126, 48, 56, 12, 161, 23],
                   ['67 2020-05-12', 8, 872, 371, 17, 14, 235, 28, 67, 12, 131, 49, 56, 12, 162, 23],
                   ['68 2020-05-13', 9, 897, 382, 17, 14, 249, 29, 68, 12, 140, 50, 58, 13, 164, 27],
                   ['69 2020-05-14', 9, 955, 394, 21, 14, 252, 29, 69, 12, 145, 50, 60, 15, 164, 28]],
                  columns = ['Date', 'ADAMS', 'ALLEN', 'BARTHOLOMEW', 'BENTON', 'BLACKFORD', 'BOONE', 'BROWN', 'WABASH', 'WARREN', 'WARRICK', 'WASHINGTON', 'WAYNE', 'WELLS', 'WHITE', 'WHITLEY']
)

# a list of tuples in the form (column_name, max_value)
column_max_list = [(column, df[column].max()) for column in df.columns.values[1:]]

# sort the list descending by the max value
column_max_list_sorted = sorted(column_max_list, key = lambda tup: tup[1], reverse = True)

# extract only the column names
rank_list = [tup[0] for tup in column_max_list_sorted]

for i in range(len(rank_list)):

    # get the column to insert next
    col = df[rank_list[i]]

    # drop the column to be inserted back
    df.drop(columns = [rank_list[i]], inplace = True)

    # insert the column at the correct index
    df.insert(loc = i + 1, column = rank_list[i], value = col)

This yields the desired rank_list

['ALLEN', 'BARTHOLOMEW', 'BOONE', 'WHITE', 'WARRICK', 'WABASH', 'WAYNE', 'WASHINGTON', 'BROWN', 'WHITLEY', 'BENTON', 'WELLS', 'BLACKFORD', 'WARREN', 'ADAMS']

as well as the desired df:

    Date            ALLEN   BARTHOLOMEW BOONE   WHITE  ...
0   65 2020-05-10   828     356         227     149    ...
1   66 2020-05-11   860     367         235     161    ...
2   67 2020-05-12   872     371         235     162    ...
3   68 2020-05-13   897     382         249     164    ...
4   69 2020-05-14   955     394         252     164    ...
Michael Hodel
  • 71
  • 1
  • 4