query data in panda dataframe in python

To query and filter data in a pandas dataframe in Python, you can use the loc and iloc methods.

loc method is label-based, which means that you have to specify rows and columns based on their row and column labels.

iloc method is integer-based, which means that you have to specify rows and columns by their integer index.

Here's an example of using loc and iloc to query and filter data in a pandas dataframe:

main.py
import pandas as pd

# create a sample dataframe
df = pd.DataFrame({'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eva'],
                   'Age': [25, 30, 35, 40, 45],
                   'Gender': ['F', 'M', 'M', 'M', 'F'],
                   'Salary': [50000, 60000, 70000, 80000, 90000]})

# select rows based on row and column labels
df.loc[df['Age'] > 30, ['Name', 'Salary']]

# select rows based on integer-based indexing
df.iloc[2:4, 0:3]
446 chars
14 lines

The first line of code imports the pandas library and renames it to pd.

Then, we create a sample dataframe with four columns: "Name", "Age", "Gender", and "Salary".

To select rows based on row and column labels, we use the loc method. The first parameter of the loc method specifies the rows to select based on a condition (in this case, where the Age is greater than 30). The second parameter specifies the columns to include in the output. In this example, we only want to include the Name and Salary columns.

To select rows based on integer-based indexing, we use the iloc method. The first parameter specifies the rows to select based on their integer index (in this case, rows 2 and 3), and the second parameter specifies the columns to select based on their integer index (in this case, columns 0, 1, and 2).

gistlibby LogSnag