read a csv file, concatenated the first two columns with a dash, and then compare to a different csv file, removing any rows from it that start with the concatenated value in python

Here's an example on how to accomplish the task using Pandas library:

main.py
import pandas as pd

# read first csv file
df1 = pd.read_csv("file1.csv")

# concatenate first two columns with a dash
df1["concatenated"] = df1.iloc[:, 0].astype(str) + "-" + df1.iloc[:, 1].astype(str)

# read second csv file
df2 = pd.read_csv("file2.csv")

# filter rows that don't match the concatenated value
df2 = df2[~df2.iloc[:, 0].isin(df1["concatenated"])]

# write the filtered dataframe to a new csv file
df2.to_csv("new_file2.csv", index=False)
457 chars
17 lines

The code reads the first csv file with pd.read_csv, concatenates the first two columns with a dash, and stores the result in a new column named concatenated. Then, it reads the second csv file and filters it based on a condition that excludes the rows where the first column starts with the concatenated value. Finally, it saves the filtered dataframe into a new csv file using to_csv function with index=False to exclude the index column.

related categories

gistlibby LogSnag