As a starting point, let’s create a simple dataframe that we are going to use in this article:
import pandas as pd
data = {'name': ['Jack', 'Frank', 'Kelly', 'Rebecca', "Monica"],
'year': [2015, 2011, 2010, 2014, None],
'reports': [24, 4, 2, 31, None]}
df = pd.DataFrame(data, index = ['New York', 'New Orleans', 'Budapest', 'Helsinki', "Cologne"])
df
name reports year
New York Jack 24.0 2015.0
New Orleans Frank 4.0 2011.0
Budapest Kelly 2.0 2010.0
Helsinki Rebecca 31.0 2014.0
Cologne Monica NaN NaN
View Only Specific Columns Link to heading
Showing only one column
df['name']
name
New York Jack
New Orleans Frank
Budapest Kelly
Helsinki Rebecca
Showing multiple columns:
df[['name', 'year']]
name year
New York Jack 2015
New Orleans Frank 2011
Budapest Kelly 2010
Helsinki Rebecca 2014
Filter Rows Where… Link to heading
Showing only the rows where the year is greater than 2012:
df[df['year'] > 2012]
name reports year
New York Jack 24 2015
Helsinki Rebecca 31 2014
Showing only the rows where the year is greater than 2012 AND reports is smaller than 30:
df[(df['year'] > 2012) & (df['reports'] < 30)]
name reports year
New York Jack 24 2015
First/Last Rows Link to heading
Showing only the first 2 rows:
df[:2]
name reports year
New York Jack 24 2015
New Orleans Frank 4 2011
Showing only the last row:
df[-1:]
name reports year
Helsinki Rebecca 31 2014
Query String Link to heading
You can also use a query string (which has to be a boolean expression) to filter your dataframe using the query function. It may come handy when your filter options are dynamic.
Showing only the rows where the year is greater than 2012 OR name is “Frank”:
df.query('year > 2012 | name == "Frank"')
name reports year
New York Jack 24 2015
New Orleans Frank 4 2011
Helsinki Rebecca 31 2014
Where Value Is In Specified List Link to heading
Let’s say we have a list:
numbers = [4, 2]
We want to view rows where the reports value is in our numbers list:
df[df['reports'].isin(numbers)]
name reports year
New Orleans Frank 4 2011
Budapest Kelly 2 2010
Where Value Is/Not null(NaN) Link to heading
Show rows where year value is not null (aka. exists):
df[df['year'].notnull()]
Show rows where year value is null (aka. missing):
df[df['year'].isnull()]