Find and Replace Macro or suggestions on other ways to approach?

DDarling

New Member
Joined
May 12, 2015
Messages
5
I have a list of thousands of URL's that I need to regularly edit and I have been manually doing this with a find and replace, but I would like to automate the work. Unfortunately there is very little consistency in the domain structure and a text to columns approach won't work.

I need to do the following with this list of URL's:
-Eliminate the "WWW" in front of all domains when it appears (this is the easiest find and replace aspect)
-Revise all sub-domains to only show the domain level info (remove the sub-domain part of the domain). For example, "help.cnn.com" needs to change to "cnn.com". However, there are thousands of sub-domain variations and keeping a list of them all is not an ideal way to approach (but keeping a list is possible if there are no other options)
-For country specific domains I need to keep the country parts of the URL.

For example, I need the list to go from:

Breaking News, U.S., World, Weather, Entertainment & Video News - CNN.com
help.cnn.com
blog.news.com
bbc.co.uk

To this:
cnn.com
cnn.com
news.com
bbc.co.uk

My domain list is in column E.

Is there a macro that can be created to do this or does anyone have alternative recommendations? when doing find and replace via a macro and a list can I still use the "*" function to find and replace any sub-domain against a specific domain (i.e. can I search *.cnn.com to change to cnn.com via a macro)

Thanks!
 

Excel Facts

How to fill five years of quarters?
Type 1Q-2023 in a cell. Grab the fill handle and drag down or right. After 4Q-2023, Excel will jump to 1Q-2024. Dash can be any character.
This formula turns your source strings into the requested ones. Try it with an extended list of URLs:

=LOWER(SUBSTITUTE(TRIM(RIGHT(SUBSTITUTE(SUBSTITUTE(A1,"."," ")," ",REPT(" ",20)),(1*ISNUMBER(SEARCH(".??#",A1&"#"))+2)*20))," ","."))
 
Upvote 0
This formula turns your source strings into the requested ones. Try it with an extended list of URLs:

=LOWER(SUBSTITUTE(TRIM(RIGHT(SUBSTITUTE(SUBSTITUTE(A1,"."," ")," ",REPT(" ",20)),(1*ISNUMBER(SEARCH(".??#",A1&"#"))+2)*20))," ","."))


This formula turns your source strings into the requested ones. Try it with an extended list of URLs:

=LOWER(SUBSTITUTE(TRIM(RIGHT(SUBSTITUTE(SUBSTITUTE(A1,"."," ")," ",REPT(" ",20)),(1*ISNUMBER(SEARCH(".??#",A1&"#"))+2)*20))," ","."))


Ivan,

Thank you very much for the help. This worked for most of the domain strings and even if this is the best we can come up with it will save me a significant amount of time.

It did not work perfectly for domains that end with a two digit code in the following ways:

-The "www." aspect is not removed for domains ending in two digits. Examples below were unchanged by the formula:
artnails.bestpicts.me
artnat.grejt.pl

<tbody>
</tbody>

The last example of something I was that didn't seem to work perfectly is that the string "007.microsoftadvertisingexchange.com" was changed to "vertisingexchange.com". I can't understand this particular issue well as it has nothing to do with the other issues and is the only ".com" domain that I saw impacted. I will admit that I'm unable to go through each and every line to see if there are other instances of similar challenges, but I did a review and didn't find others.

Thank you very much for your help.


<table cellspacing="0" cellpadding="0" class="t1" verdana,="" arial,="" tahoma,="" calibri,="" geneva,="" sans-serif;"=""><tbody>[TR]
[TD="class: td1"]


[/TD]
[/TR]
</tbody></table>
 
Upvote 0
This formula turns your source strings into the requested ones. Try it with an extended list of URLs:

=LOWER(SUBSTITUTE(TRIM(RIGHT(SUBSTITUTE(SUBSTITUTE(A1,"."," ")," ",REPT(" ",20)),(1*ISNUMBER(SEARCH(".??#",A1&"#"))+2)*20))," ","."))


Ivan,

Thank you very much for the help. This worked for most of the domain strings and even if this is the best we can come up with it will save me a significant amount of time.

It did not work very well for domains that end with a two digit code in the following ways (I hope this make sense!):

-The "www." aspect is not adjusted correctly for domains ending in two digits. Sometimes no changes are made, sometimes the formula removes one or two of the "W's".

Example: "www.teamspeak-verleih.de" became "w.teamspeak-verleih.de". The formula removed two of the "W's" in the "WWW", but not all of them. This happened multiple times and sometimes removed one or two of the W's.

-I think the issue impacts not just the "WWW" of course, but all sub-domains for domains that end with a two digit code. Overall the formula seemed to do a number of different things two domains ending in 2 digits.

I also think I might have an issue that prevents a perfect formula being easily implemented. The issue is shown through the examples below:

Example 1: "zshanakaz.bestpicts.me" and "zpravy.aktualne.cz" were left unchanged by the formula. I would love to be able to remove the subdomains here as well and show just "bestpicts.me" and "aktualne.cz"

However looking at example 2 below and comparing it to example 1 above I discovered a problem.

Example 2: ºÎ³¬ - ´ëÇѹα¹ ¹Ù´Ù³¬½ÃÀÇ ¸ÞÄ« was correctly changed to "ybada.co.kr".

In the case of Example 2 I want to keep the sub-domain, because it's not really a sub-domain, you actually need the entire string as the website isn't "co.kr". However, in Example 1 the sub-domains should be removed. Would I be correct to say that this complicates the use of a formula as a solution unless we're building a ton of if/then statements?

The last example of something I was that didn't seem to work perfectly is that the string "007.microsoftadvertisingexchange.com" was changed to "vertisingexchange.com". I can't understand this particular issue well as it has nothing to do with the other issues and is the only ".com" domain that I saw impacted. I will admit that I'm unable to go through each and every line to see if there are other instances of similar challenges, but I did a review and didn't find others.

Thank you very much for your help.
 
Upvote 0
Unfortunately, I do not exactly know the rules of constructing URLs so please list the URLs where the formula fails along with the required result.

Please change 20 in the formula to 100 (at both locations) to avoid problems with long strings („007….”)
 
Upvote 0
Ivan,

When I do a find and replace for "WWW." and remove the "WWW." from the domains before I apply the formula it works better.

Ideally, length of the domain wouldn't have any impact, however I see how to adjust the numbers in the formula per your direction.

Below are two tables that listed the original and desired URL's. The first table is specific to instances where the desired outcome did not match the result of the formula. The second table has just some more basic examples of the original and desired outcomes.

Thank you again for your help. The current status is far improved over my initial state. If we can further improve the outcome great, if not no worries and thanks.


Original:Revised After Formula Applied:Desired Outcome:
forum.4pforen.4players.de4pforen.4players.de4players.de
beruskavn.rajce.idnes.czrajce.idnes.czidnes.cz
fiks-s-c-biuro-ksiegowe-i-laczynska-l.znanyprawnik.placzynska-l.znanyprawnik.plznanyprawnik.pl
chicagoarchitecture.infohicagoarchitecture.infochicagoarchitecture.info
ultimate-newone.blogspot.filtimate-newone.blogspot.fiblogspot.fi
davidhills.dailyfunnypics.mevidhills.dailyfunnypics.medailyfunnypics.me
motors.friday-ad.co.ukfriday-ad.co.ukfriday-ad.co.uk
inframe.platform.onet.plplatform.onet.plonet.pl
forumarchiwum.gry-online.plorumarchiwum.gry-online.plgry-online.pl
heavyequipmentforums.comeavyequipmentforums.comheavyequipmentforums.com
springfieldspringfield.co.ukringfieldspringfield.co.ukspringfieldspringfield.co.uk

<tbody>
</tbody>

<tbody>
</tbody>


Original:Desired:
www.cnn.com

<tbody>
</tbody>
cnn.com

<tbody>
</tbody>
www.sports.cnn.com

<tbody>
</tbody>
cnn.com
sports.cnn.com

<tbody>
</tbody>
cnn.com
bbc.co.uk

<tbody>
</tbody>
bbc.co.uk

<tbody>
</tbody>
www.bbc.co.uk

<tbody>
</tbody>
bbc.co.uk

<tbody>
</tbody>
07da8c2485ccf10a.anonymous.google

<tbody>
</tbody>
anonymous.google

<tbody>
</tbody>
sports.khan.co.kr

<tbody>
</tbody>
khan.co.kr

<tbody>
</tbody>
hotgirlscollection.hostedgalleries.me

<tbody>
</tbody>
hostedgalleries.me

<tbody>
</tbody>
www.store.na.square.com
square.com

<tbody>
</tbody>
forumarchiwum.gry-online.pl

<tbody>
</tbody>
gry-online.pl

<tbody>
</tbody>
m.birdtrader.co.uk

<tbody>
</tbody>
birdtrader.co.uk

<tbody>
</tbody>

<tbody>
</tbody>
 
Upvote 0
Enter into B1 and copy down:

=LOWER(SUBSTITUTE(TRIM(RIGHT(SUBSTITUTE(SUBSTITUTE(A1,"."," ")," ",REPT(" ",100)),(1*ISNUMBER(SEARCH(".co.??#",A1&"#"))+2)*100))," ","."))

In column C the desired results are listed, in col D „OK” means that the result given by the formula and the requested result are the same.
Excel Workbook
ABCD
1Breaking News, U.S., World, Weather, Entertainment & Video News - CNN.comcnn.comcnn.comOK
2help.cnn.comcnn.comcnn.comOK
3blog.news.comnews.comnews.comOK
4bbc.co.ukbbc.co.ukbbc.co.ukOK
5forum.4pforen.4players.de4players.de4players.deOK
6beruskavn.rajce.idnes.czidnes.czidnes.czOK
7fiks-s-c-biuro-ksiegowe-i-laczynska-l.znanyprawnik.plznanyprawnik.plznanyprawnik.plOK
8chicagoarchitecture.infochicagoarchitecture.infochicagoarchitecture.infoOK
9ultimate-newone.blogspot.fiblogspot.fiblogspot.fiOK
10davidhills.dailyfunnypics.medailyfunnypics.medailyfunnypics.meOK
11motors.friday-ad.co.ukfriday-ad.co.ukfriday-ad.co.ukOK
12inframe.platform.onet.plonet.plonet.plOK
13forumarchiwum.gry-online.plgry-online.plgry-online.plOK
14heavyequipmentforums.comheavyequipmentforums.comheavyequipmentforums.comOK
15springfieldspringfield.co.ukspringfieldspringfield.co.ukspringfieldspringfield.co.ukOK
16www.cnn.comcnn.comcnn.comOK
17www.sports.cnn.comcnn.comcnn.comOK
18sports.cnn.comcnn.comcnn.comOK
19www.bbc.co.ukbbc.co.ukbbc.co.ukOK
2007da8c2485ccf10a.anonymous.googleanonymous.googleanonymous.googleOK
21sports.khan.co.krkhan.co.krkhan.co.krOK
22hotgirlscollection.hostedgalleries.mehostedgalleries.mehostedgalleries.meOK
23www.store.na.square.comsquare.comsquare.comOK
24forumarchiwum.gry-online.plgry-online.plgry-online.plOK
25m.birdtrader.co.ukbirdtrader.co.ukbirdtrader.co.ukOK
Sheet
 
Upvote 0
Ivan,

This seems to work very well. Thank you for your assistance I appreciate it.

Thank you,
Daniel
 
Upvote 0

Forum statistics

Threads
1,215,491
Messages
6,125,109
Members
449,205
Latest member
ralemanygarcia

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top