anonymise a data file in csharp in csharp

Anonymizing data means removing personally identifiable information (PII) from the data, while still preserving its utility. Here is an example of how to anonymize a CSV file in C#:

using System;
using System.IO;

class AnonymizeData
{
    static void Main()
    {
        string inputFile = "data.csv";
        string outputFile = "anonymized_data.csv";

        using (var reader = new StreamReader(inputFile))
        {
            using (var writer = new StreamWriter(outputFile))
            {
                bool firstLine = true;
                while (!reader.EndOfStream)
                {
                    string line = reader.ReadLine();
                    if (firstLine)
                    {
                        // Write header line to output file unchanged.
                        writer.WriteLine(line);
                        firstLine = false;
                    }
                    else
                    {
                        // Anonymize data and write to output file.
                        string[] values = line.Split(',');
                        values[0] = Anonymize(values[0]);  // example: anonymize first column
                        writer.WriteLine(string.Join(',', values));
                    }
                }
            }
        }
    }

    static string Anonymize(string value)
    {
        // Replace PII with a random value.
        return Guid.NewGuid().ToString();
    }
}
1261 chars
43 lines

In this example, the code reads an input CSV file named data.csv and writes an anonymized version of the data to an output CSV file named anonymized_data.csv. The first line of the input file is assumed to be a header line containing the column names, and is written to the output file unchanged. Each subsequent line of the input file is split into fields using Split(','), and the first field (in this example) is replaced with a random value using the Anonymize method. Finally, the anonymized line is written to the output file using a StreamWriter.

The Anonymize method is a placeholder method that replaces the input value with a random GUID. In practice, you would want to replace PII with a value that preserves the utility of the data, while still protecting the privacy of the individuals represented in the data.

gistlibby LogSnag