read a csv file, concatenated the first two columns with a dash, and then compare to a different csv file, removing any rows from it that start with the concatenated value. use only standard modules in python

You can use the csv module to read and manipulate CSV files, and os to check if the second file exists and read it as well. Here's some sample code:

main.py
import csv
import os

file1 = "file1.csv"
file2 = "file2.csv"

# read file 1 and concatenate first two columns
with open(file1, newline='') as f:
    reader = csv.reader(f)
    lines = []
    for row in reader:
        lines.append([row[0] + '-' + row[1]] + row[2:])
        
# remove matching rows from file 2
if os.path.exists(file2):
    with open(file2, newline='') as f:
        reader = csv.reader(f)
        lines2 = []
        for row in reader:
            if row[0] + '-' + row[1] not in [l[0] for l in lines]:
                lines2.append(row)
    # write new file 2
    with open(file2, 'w', newline='') as f:
        writer = csv.writer(f)
        writer.writerows(lines2)
687 chars
26 lines

This code reads file1.csv, concatenates the first two columns with a dash, and then stores the result along with the remaining columns in a list of lists called lines.

Next, it checks if file2.csv exists. If it does, the same process of reading, concatenating, and comparison is done on it. Rows that start with the concatenated value are not stored in the lines2 list of lists.

Finally, if file2.csv exists, a new file is written with the updated content of lines2 using the csv.writer object.

Note that this code assumes that the first row of each file contains headers and thus skips it when reading the files. If this is not the case, you may need to adjust the code accordingly.

related categories

gistlibby LogSnag