how to subset columns if they contain string in python

In order to subset columns in pandas dataframe based on the given string, you can use the filter() function. The filter(regex=”String”) function returns the columns that contain the given string.

Here's an example code:


main.py
import pandas as pd

# create sample dataframe
df = pd.DataFrame({'A': ['foo1', 'foo2', 'foo3'], 'B': ['bar1', 'bar2', 'bar3'], 
                   'C': ['foofoo1', 'foo2foo', 'foo3foo'], 'D': [1, 2, 3]})

# subset columns that contain 'foo' using filter() function
new_df = df.filter(regex='foo')

print(new_df)
313 chars
11 lines

This will output:


main.py
     A         C
0  foo1  foofoo1
1  foo2   foo2foo
2  foo3   foo3foo
70 chars
5 lines

Alternatively, if you want to subset columns that exactly match the given string, you can use the loc function as follows:


main.py
new_df = df.loc[:, df.columns.str.contains('foo')]
51 chars
2 lines

This also produces the same output as above.

similar python code snippets

create a csv file in python

merge tables in python

divide alternating rows in table in python

query data in panda dataframe in python

dataframes in python

add a line in a dataframe in python

add colname to a dataframe in python

add variables to cell pandas in python

take the column data from two different excel files to construct a three dimensional array with a part of the data in the columns and the other in the rows in python

how to turn a csv file into a dataframe in python

related categories

pandas

data-manipulation