Anonymize identifying information without losing data

yewborn · Mar 30, 2020

I have a large dataset (70,000 rows) on property ownership, which includes personal identifying information (i.e. people's names). I want to make the data anonymous so that other users cannot identify the property owners. However, I need to preserve the internal fidelity of the names, so that users can tell if multiple properties have the same owner. To further complicate things, the data are littered with typos. So the same name may appear several times, but spelled slightly differently. Here's an example:

Property number	Owner name
1	Jeremiah Wilson
2	Emily Chang
3	Emily Chang
4	Jeremaih Wilson
5	Jeremiah W. Brown

In the above example data, I want to achieve the following:

the names in Column 2 cannot be identified
the names for properties 2 and 3 are identical to each other
the names for properties 1 and 4 are similar enough to know this is a typo
the names for property 1 and 5 are clearly different

In an ideal world, the solution would be relatively simple to implement. I am requesting these data from a government agency, and I need to provide them instructions for how to anonymize the data while preserving the internal fidelity of the names.

I apologize if I'm not using the correct terminology for aspects of this problem. Thank you for your help.

Anonymize identifying information without losing data

yewborn

New Member

Excel Facts

Similar threads

Forum statistics

Share this page

Anonymize identifying information without losing data

yewborn

New Member

Excel Facts

Similar threads

Forum statistics

Share this page

We've detected that you are using an adblocker.

Which adblocker are you using?

Disable AdBlock

Disable AdBlock Plus

Disable uBlock Origin

Disable uBlock