how to subset columns if they contain string in python

In order to subset columns in pandas dataframe based on the given string, you can use the filter() function. The filter(regex=”String”) function returns the columns that contain the given string.

Here's an example code:

main.py
import pandas as pd

# create sample dataframe
df = pd.DataFrame({'A': ['foo1', 'foo2', 'foo3'], 'B': ['bar1', 'bar2', 'bar3'], 
                   'C': ['foofoo1', 'foo2foo', 'foo3foo'], 'D': [1, 2, 3]})

# subset columns that contain 'foo' using filter() function
new_df = df.filter(regex='foo')

print(new_df)
313 chars
11 lines

This will output:

main.py
     A         C
0  foo1  foofoo1
1  foo2   foo2foo
2  foo3   foo3foo
70 chars
5 lines

Alternatively, if you want to subset columns that exactly match the given string, you can use the loc function as follows:

main.py
new_df = df.loc[:, df.columns.str.contains('foo')]
51 chars
2 lines

This also produces the same output as above.

related categories

gistlibby LogSnag