Pandas from basic to advanced for Data Scientists
Pandas make life easier for any data scientist
A data scientist is someone who can obtain, scrub, explore, model, and interpret data, blending hacking, statistics, and machine learning — Hilary Mason
Where there is smoke, there is a fire. In similar, if you are dealing with data pandas will never take a holiday. It helps you to explore and interpret the data as fast as it can.
Pandas is the most commonly used python library used for data manipulation and data analysis.
In this article, I will try to address the pandas concepts or tricks which make our life easier. Let us start from the basic to advanced levels. I will walk through the pandas with the help of weather data.
Import Pandas and create a data frame
There are many ways of creating a data frame using files, lists, dictionaries, etc. Here I have created a data frame by reading data from the CSV file.
import pandas as pddf = pd.read_csv("weather.csv")
df
Output
Select the specific column(s)
Sometimes you need to operate or manipulate only specific columns. Let us assume you would like to analyze how temperature is changing daily. In this case, we will select the temperature and day.
df[[‘temperature’,’day’]]
Rename the column
Pandas provide you the simple function(rename) to change the name of a column or set of columns to make the job easy.
df.rename(columns = {‘temperature’: ‘temp’, ‘event’:’eventtype’})
Filtering a data frame
Suppose you would like to see the cities which are hotter along with dates.
df[[‘day’,’city’]][df.event==’Sunny’]
Output
So far we have seen some basics lets dive into deep and the real pandas start here. As I said pandas will never take a holiday even you would like to have complex queries.
Grouping
Suppose if you want to Manipulate on a particular group of data. In this case, let us get only the rows that belong to new york. With group object, you can get a summary of the sum, mean, median of all groups at a time.
Group by City
city_group = df.groupby(‘city’)
A group object was created and if you want to see specific group data, just need to get the group.
city_group.get_group(‘new york’)
Output
Aggregations
In the above section, we just grouped the data by the city but what if I would like to see the average temperature and average wind speed ???. We will use aggregations here.
Group by and aggregate
df.groupby(‘city’).agg({‘temperature’:’mean’, ‘windspeed’:’mean’})
Output
Merging
In the above sections, we dealt with having a single data frame. If there are two data frames and you would like to analyze them together !!!. In this scenario, the merge plays a key role and simplifies the join of two data frames.
create two data frames
df1 = pd.DataFrame({
“city”: [“new york”,”florida”,”mumbai”],
“temperature”: [22,37,35],
})
df2 = pd.DataFrame({
“city”: [“chicago”,”new york”,”florida”],
“humidity”: [65,68,75],
})
Simple Merge: This gives you the matching rows in both data frames
pd.merge(df1,df2,on=’city’)
Output
Outer: Get all rows from both data frames. Add a new parameter (how).
pd.merge(df1,df2,on=”city”,how=”outer”)
Output
In similar, we can get all the matching rows along with the left data frame (left join) and right data frame (right join). By specifying parameter how with values left/right.
Crosstab
Suppose if you want to see the frequency count of the event type ( rainy/sunny) in each city. Cross tab makes these things easier.
pd.crosstab(df.city,df.event)
Output
Note: We can get any aggregation mean, median, etc. Just we need to pass an extra parameter to the function.
Reshape with melt
If you want to get the columns as rows along with values, suppose for each city I would like to have temperature and wind speed in a separate value column. In this case temperature, windspeed hold a single column and their values hold another column.
pd.melt(df,id_vars=[‘day’,’city’,’event’],var_name=’attribute’)
Output
References
code basics, https://www.youtube.com/channel/UCh9nVJoWXmFb7sLApWGcLPQ
Hope you enjoyed it !!! You can also check the aritcle on pandas tricks and this will be more interesting Pandas tricks for Data Scientists !!!Stay tuned !!!! Please do comment on any queries or suggestions !!!!!