In order to get rows or row IDs of Dataframe having maximum values for each column, Pandas DataFrame’s .idxmax() is used.
.idxmax() returns Row Ids of Maximum values of each column of a DataFrame.
We are not using .max() here because .max() returns the actual value rather then the row in which that value resides. At times it’s useful to have a look at the whole row of a DataFrame where a column’s max value is present. Thats where .idxmax() is useful.
Syntax:
DataFrame.idxmax(axis=0, skipna=True)
Parameters :
axis : 0 or ‘index’ for row-wise, 1 or ‘columns’ for column-wise
skipna : Exclude NA/null values. If an entire row/column is NA, the result will be NA
Returns : idxmax : Series
Following is the code with comments, description and results of the commands to be run in Python 3 for using .idxmax() to get row ids of maximum values for each column:
#import Pandas
import pandas as pd
#import numpy
import numpy as np
# Make a DataFrame from NumPy's Random function. This would be our dummy data to play with
df = pd.DataFrame(np.random.rand(20,4),columns=['Mass','Volume','Weight','Area'])
df.head()
Mass Volume Weight Area
0 0.979216 0.766523 0.289889 0.250828
1 0.358765 0.385804 0.527318 0.011638
2 0.531363 0.812544 0.731196 0.335956
3 0.735230 0.956838 0.182387 0.831519
4 0.149036 0.053258 0.456686 0.180915
# Using .idxmax to get row ids of maximum values in each column
df.idxmax()
Mass 0
Volume 12
Weight 19
Area 3
dtype: int64
# Get max values on the basis of their row ids
df.loc[df['Mass'].idxmax()]
Mass 0.979216
Volume 0.766523
Weight 0.289889
Area 0.250828
Name: 0, dtype: float64