How to scrub sensitive data in excel working with XML

MO454008

New Member
Joined
Aug 31, 2018
Messages
2
Hello -

Really need some help here.

I received chat logs in XML format and all the information is contained in a single excel sheet output. Transcript is extracted to single line XML.

Here is an example of what I want to remove and don't know how:

Jane.Smith@gmail.com -> How do I insert a rule that removes email addresses? Some are gmail, hotmail, yahoo, etc. Is there a way to create a rule that removes the 5 symbols in front of the @ and 5 after?

Example below - this text underneath is 20x longer and in a single cell

msgText treatAs="NORMAL">To begin with, I may need to ask for details to pull up your account and for security purposes too, can I have the account number and the primary email address of your account?</msgText></message><message userId="[deleted]" timeShift="1185" visibility="ALL" eventId="34"><msgText>account number is [deleted] and email address is [deleted.text]@yahoo.com</msgText

Additional questions: how to scrub phone numbers when they are formatted differently? 999 999 2222 vs. 999-999-2222 vs. 9999992222

Thanks,
Mark
 

Excel Facts

Shade all formula cells
To shade all formula cells: Home, Find & Select, Formulas to select all formulas. Then apply a light fill color.
This UDF will scrub emails and phone numbers. There is an example below.
Code:
Function ScrubEmail(xml As String)
Dim RX      As Object: Set RX = CreateObject("vbscript.regexp")
Dim Pattern As String: Pattern = "(\w+\.\w+)(@\w+\.\w+)"
Dim Pat2    As String: Pat2 = "\d{3}[ -]?\d{3}[ -]?\d{4}"
Dim tmp     As String: tmp = xml


With RX
    .Pattern = Pattern
    .Global = True
    .IgnoreCase = True
    .MultiLine = True
    If .test(xml) Then tmp = .Replace(xml, "[deleted]" & "$2")
    .Pattern = Pat2
    If .test(tmp) Then tmp = .Replace(tmp, "[deleted]")
    ScrubEmail = tmp
End With


End Function
<table valign="top" border="1"><caption>LEGO HTML</caption><col width="54"><col width="2391"><col width="2303">
<tr><td></td><th style="background-color:rgb(166, 166, 166)"><font style="color:rgb(0, 0, 0)">A</font></th><th style="background-color:rgb(166, 166, 166)"><font style="color:rgb(0, 0, 0)">B</font></th></tr>
<tr><th style="background-color:rgb(166, 166, 166)"><font style="color:rgb(0, 0, 0)">1</font></th><td style="background-color:rgb(255, 255, 255)"><font face="Calibri" style="color:rgb(0, 0, 0)"><b>Original</b></font></td><td style="background-color:rgb(255, 255, 255)"><font face="Calibri" style="color:rgb(0, 0, 0)"><b>Scrubbed</b></font></td></tr>
<tr><th style="background-color:rgb(166, 166, 166)"><font style="color:rgb(0, 0, 0)">2</font></th><td style="background-color:rgb(255, 255, 255)"><font face="Calibri" style="color:rgb(0, 0, 0)">msgText treatAs="NORMAL">To begin with, I may need to ask for details to pull up your account and for security purposes too, can I have the account number and the primary email address of your account?account number is 999 999 2222 or 999-999-2222 or 9999992222 and email address is Jane.Smith@yahoo.com and John.Smith@gmail.com</font></td><td style="background-color:rgb(255, 255, 255)"><font face="Calibri" style="color:rgb(0, 0, 0)">msgText treatAs="NORMAL">To begin with, I may need to ask for details to pull up your account and for security purposes too, can I have the account number and the primary email address of your account?account number is [deleted] or [deleted] or [deleted] and email address is [deleted]@yahoo.com and [deleted]@gmail.com</font></td></tr></table><table style="width:100%" valign="top" border="1"><caption>Worksheet Formulas</caption><tr><th style="background-color:rgb(166, 166, 166)"><font style="color:rgb(0, 0, 0)">Cell</font></th><th style="background-color:rgb(166, 166, 166)"><font style="color:rgb(0, 0, 0)">Formula</font></th></tr><tr><th style="background-color:rgb(255, 255, 255)"><font style="color:rgb(0, 0, 0)">B2</font></th></td><td style="background-color:rgb(255, 255, 255)"><font style="color:rgb(0, 0, 0)">=ScrubEmail(A2)</font></td></tr></table>
 
Upvote 0
Thanks for the response!

Is there a way to do it such that I stick with Excel and not run another program/script/tool?

Thanks,
Mark
 
Upvote 0

Forum statistics

Threads
1,216,084
Messages
6,128,721
Members
449,465
Latest member
TAKLAM

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top