In this series tutorial for Data Analytics in Python , we would be using Python with Pandas for generating simple insights from Data by using some grouping techniques. Grouping of data is basically aggregation of data on the basis of some columns or attributes. Groupby basically splits data into different groups depending on the columns provided.
Lets consider a simple data of Students and their marks in two subjects along with their respective teachers:
>>> import pandas as pd
>>> df = pd.DataFrame({'Student':['Beth', 'Alex', 'Diana', 'Adrian'],
'Age': [18, 19, 18, 19],
'Math': [75, 82, 89, 85],
'Science': [65, 75, 86, 90],
'Teacher': ['William', 'William', 'Robert', 'Robert']})
Just to get an idea how our data looks, we can print the records as a Table:
>>> df.head()
Age Math Science Student Teacher
0 18 75 65 Beth William
1 19 82 75 Alex William
2 18 89 86 Diana Robert
3 19 85 90 Adrian Robert
Now we would try to extract some basic insights from our Pandas DataFrame using GroupBy Function:
>>> df.groupby('Teacher').describe()
Age Math Science
Teacher
Robert count 2.000000 2.000000 2.000000
mean 18.500000 87.000000 88.000000
std 0.707107 2.828427 2.828427
min 18.000000 85.000000 86.000000
25% 18.250000 86.000000 87.000000
50% 18.500000 87.000000 88.000000
75% 18.750000 88.000000 89.000000
max 19.000000 89.000000 90.000000
William count 2.000000 2.000000 2.000000
mean 18.500000 78.500000 70.000000
std 0.707107 4.949747 7.071068
min 18.000000 75.000000 65.000000
25% 18.250000 76.750000 67.500000
50% 18.500000 78.500000 70.000000
75% 18.750000 80.250000 72.500000
max 19.000000 82.000000 75.000000
So here we can see some direct insights about the teachers. For instance in the case above, we can see that Robert’s students are performing better than William’s considering the Mean values produced above. We can see from this that may be Robert is a better teacher than Williams or has better students or something like that. We can filter the Teacher, Robert’s Data from the DataFrame as follows to validate our insights:
>>> df[df['Teacher']=='Robert']
Age Math Science Student Teacher
2 18 89 86 Diana Robert
3 19 85 90 Adrian Robert
For more Data Filtering Techniques using Pandas, visit Data Analytics in Python – Data Filtering with Pandas – Learning By Doing
We can go further by getting their the Medians of our Pandas DataFrames:
>>> df.groupby('Teacher').median()
Age Math Science
Teacher
Robert 18.5 87.0 88.0
William 18.5 78.5 70.0
And we can further extract insights on the basis of Teachers and their Student’s Age by using Group By on two columns and getting their Median in the following way:
>>> df.groupby(['Teacher', 'Age']).median()
Math Science
Teacher Age
Robert 18 89 86
19 85 90
William 18 75 65
19 82 75
In the next post, we will look at how we can apply Arbitrary functions while using group by in Pandas.

