Remove all characters from a url except the root domain and domain extension

christojuan

New Member
Joined
Feb 19, 2013
Messages
6
Hi - This formula takes any url and removes everything that I want except that it does not remove the forward slash and all characters after that.


Code:
=IFERROR(MID(B10, FIND("^", SUBSTITUTE(B10,".","^", LEN(B10)-LEN(SUBSTITUTE(B10, ".", ""))-IF(ISNUMBER(SEARCH(OR(".co.",".com."), B10)), 2, 1)))+1, 100),B10)


The formula evaluates the items in the left column to the result in the right, but what I am looking for is for every url example in the left column to generate "woohoo.com" as a result of the formula in the right column, regardless of the top level domain extension (.com, .ru. .co, .net, .jp etc -in those cases it would generate woohoo.com, woohoo.ru, woohoo.net, woohoo.jp etc.).


http://www.bob.woohoo.com woohoo.com
http://www.bob.woohoo.com/ woohoo.com/
http://www.bill.woohoo.com woohoo.com
http://www.bill.woohoo.com/folder1 woohoo.com/folder1
http://www.bill.woohoo.com/folder1/ woohoo.com/folder1/
http://www.bill.woohoo.com/folder1/folder2 woohoo.com/folder1/folder2
http://www.bill.woohoo.com/folder1/folder2/ woohoo.com/folder1/folder2/
http://bill.woohoo.com woohoo.com
www.bill.woohoo.com woohoo.com
bill.woohoo.com woohoo.com


I would really appreciate help getting that formula to achieve that (if it is possible).


Thanks very much in advance!

 
Last edited:

Excel Facts

Is there a shortcut key for strikethrough?
Ctrl+S is used for Save. Ctrl+5 is used for Strikethrough. Why Ctrl+5? When you use hashmarks to count |||| is 4, strike through to mean 5.
Hi there! Welcome to the board!

If you can do a text to columns (delimited with the forward slash), you can use something like this (assuming your domain was in column C, which may include site add-ons)...

Code:
=RIGHT(C1,LEN(C1)-IF(LEN(C1)-LEN(SUBSTITUTE(C1,".",""))=1,0,FIND("|",SUBSTITUTE(C1,".","|",LEN(C1)-LEN(SUBSTITUTE(C1,".",""))-1))))

HTH
 
Upvote 0
Zach - Thanks very much for your quick reply. I tried the formula and it generates the same result as the original. I tried replacing the pipe with a forward slash but that does not remove the instances where there there is a forward slash and folders after the domain extension. can you give please give me a little more guidance on how I should change your formula to produce the desired result?

Hi there! Welcome to the board!

If you can do a text to columns (delimited with the forward slash), you can use something like this (assuming your domain was in column C, which may include site add-ons)...

Code:
=RIGHT(C1,LEN(C1)-IF(LEN(C1)-LEN(SUBSTITUTE(C1,".",""))=1,0,FIND("|",SUBSTITUTE(C1,".","|",LEN(C1)-LEN(SUBSTITUTE(C1,".",""))-1))))

HTH
 
Upvote 0
It is kind of ugly, but this formula appears to do what you asked for...

=TRIM(RIGHT(SUBSTITUTE(SUBSTITUTE(A1,"."&TRIM(RIGHT(SUBSTITUTE(A1,".",REPT(" ",99)),99)),""),".",REPT(" ",99)),99))&"."&LEFT(TRIM(RIGHT(SUBSTITUTE(A1,".",REPT(" ",99)),99)),FIND("/",TRIM(RIGHT(SUBSTITUTE(A1,".",REPT(" ",99)),99))&"/")-1)
 
Upvote 0
actually something broke :(.. the examples I provided work, but when applying to a real list it seems to be hit and miss.

Please see:
Buses | Transport for London
which becomes:
uk/modalpages/2605.aspx
whereas the one above it worked (which does not make sense to me).

Also see:
Product Tours - Constant Contact Learning Center
which becomes:
com/features/tours.jsp
wheras the Your Local Guildford, Surrey Web Design Company worked propery and become thegoodwebcompany.com

I really don't want to feature creep this request, but I think I failed to cover the full list of example urls.

I'd appreciate it if someone could help me clean this up.

http://www.bob.woohoo.com
woohoo.com
http://www.bob.woohoo.com/
woohoo.com
http://www.bill.woohoo.com
woohoo.com
http://www.bill.woohoo.com/folder1
woohoo.com
http://www.bill.woohoo.com/folder1/
woohoo.com
http://www.bill.woohoo.com/folder1/folder2
woohoo.com
http://www.bill.woohoo.com/folder1/folder2/
woohoo.com
http://bill.woohoo.com
woohoo.com
www.bill.woohoo.com
woohoo.com
bill.woohoo.com
woohoo.com
http://portal.hud.gov/hudportal/HUD/states
hud.gov
http://gasprices.mapquest.com/
mapquest.com
http://historymatters.gmu.edu/browse/refdesk/
gmu.edu
http://blast.ncbi.nlm.nih.gov/
nih.gov
http://www.tfl.gov.uk/modalpages/2605.aspx
uk/modalpages/2605.aspx
http://www.constantcontact.com/features/tours.jsp
com/features/tours.jsp
http://www.bbc.co.uk/radio/contact/
co.uk
http://www.macromedia.com/support/documentation/en/flashplayer/help/help02.html
com/support/documentation/en/flashplayer/help/help02.html
http://radar.weather.gov/
weather.gov
http://www.thegoodwebcompany.com/
thegoodwebcompany.com
http://www.usa.gov/topics/money/insurance/tips.shtml
gov/topics/money/insurance/tips.shtml

<tbody>
</tbody>
 
Upvote 0
Okay, give this (even uglier) formula a try...

=TRIM(RIGHT(SUBSTITUTE(TRIM(TRIM(LEFT(SUBSTITUTE(TRIM(IFERROR(MID(A1,FIND("://",A1)+3,99),A1))&"/","/",REPT(" ",99)),99))),".",REPT(" ",99)),99*(2+(LEN(TRIM(RIGHT(SUBSTITUTE(TRIM(TRIM(LEFT(SUBSTITUTE(TRIM(IFERROR(MID(A1,FIND("://",A1)+3,99),A1))&"/","/",REPT(" ",99)),99)))&".",".",REPT(" ",99)),198)))=2))))
 
Upvote 0
Wow. very close...

and I would fix this but I just cannot see where the space can be replaced by . in whoohoo.com etc. I've done some relatively involved formulas in the past, but your stuff is way, way far beyond what i can translate. Can you help me with this last piece?

http://www.bob.woohoo.com
woohoo com
http://www.bob.woohoo.com/
woohoo com
http://www.bill.woohoo.com
woohoo com
http://www.bill.woohoo.com/folder1
woohoo com
http://www.bill.woohoo.com/folder1/
woohoo com
http://www.bill.woohoo.com/folder1/folder2
woohoo com
http://www.bill.woohoo.com/folder1/folder2/
woohoo com
http://bill.woohoo.com
woohoo com
www.bill.woohoo.com
woohoo com
bill.woohoo.com
woohoo com
http://portal.hud.gov/hudportal/HUD/states
hud gov
http://gasprices.mapquest.com/
mapquest com
http://historymatters.gmu.edu/browse/refdesk/
gmu edu
http://blast.ncbi.nlm.nih.gov/
nih gov
http://www.tfl.gov.uk/modalpages/2605.aspx
tfl gov uk
http://www.constantcontact.com/features/tours.jsp
constantcontact com

<tbody>
</tbody>

Okay, give this (even uglier) formula a try...

=TRIM(RIGHT(SUBSTITUTE(TRIM(TRIM(LEFT(SUBSTITUTE(TRIM(IFERROR(MID(A1,FIND("://",A1)+3,99),A1))&"/","/",REPT(" ",99)),99))),".",REPT(" ",99)),99*(2+(LEN(TRIM(RIGHT(SUBSTITUTE(TRIM(TRIM(LEFT(SUBSTITUTE(TRIM(IFERROR(MID(A1,FIND("://",A1)+3,99),A1))&"/","/",REPT(" ",99)),99)))&".",".",REPT(" ",99)),198)))=2))))
 
Upvote 0
Wow. very close...

and I would fix this but I just cannot see where the space can be replaced by . in whoohoo.com etc. I've done some relatively involved formulas in the past, but your stuff is way, way far beyond what i can translate. Can you help me with this last piece?

http://www.bob.woohoo.com
woohoo com
http://www.bob.woohoo.com/
woohoo com
http://www.bill.woohoo.com
woohoo com
http://www.bill.woohoo.com/folder1
woohoo com
http://www.bill.woohoo.com/folder1/
woohoo com
http://www.bill.woohoo.com/folder1/folder2
woohoo com
http://www.bill.woohoo.com/folder1/folder2/
woohoo com
http://bill.woohoo.com
woohoo com
www.bill.woohoo.com
woohoo com
bill.woohoo.com
woohoo com
http://portal.hud.gov/hudportal/HUD/states
hud gov
http://gasprices.mapquest.com/
mapquest com
http://historymatters.gmu.edu/browse/refdesk/
gmu edu
http://blast.ncbi.nlm.nih.gov/
nih gov
http://www.tfl.gov.uk/modalpages/2605.aspx
tfl gov uk
http://www.constantcontact.com/features/tours.jsp
constantcontact com

<TBODY>
</TBODY>
I didn't even notice the "dots" were gone. Okay, let's just SUBSTITUTE the spaces away...

=SUBSTITUTE(TRIM(RIGHT(SUBSTITUTE(TRIM(TRIM(LEFT(SUBSTITUTE(TRIM(IFERROR(MID(A1,FIND("://",A1)+3,99),A1))&"/","/",REPT(" ",99)),99))),".",REPT(" ",99)),99*(2+(LEN(TRIM(RIGHT(SUBSTITUTE(TRIM(TRIM(LEFT(SUBSTITUTE(TRIM(IFERROR(MID(A1,FIND("://",A1)+3,99),A1))&"/","/",REPT(" ",99)),99)))&".",".",REPT(" ",99)),198)))=2))))," ",".")
 
Upvote 0
woohoo! that is it. Rick, thank you very, very much. This is SO awesome and will save me hours of manual work. I really appreciate it.

I didn't even notice the "dots" were gone. Okay, let's just SUBSTITUTE the spaces away...

=SUBSTITUTE(TRIM(RIGHT(SUBSTITUTE(TRIM(TRIM(LEFT(SUBSTITUTE(TRIM(IFERROR(MID(A1,FIND("://",A1)+3,99),A1))&"/","/",REPT(" ",99)),99))),".",REPT(" ",99)),99*(2+(LEN(TRIM(RIGHT(SUBSTITUTE(TRIM(TRIM(LEFT(SUBSTITUTE(TRIM(IFERROR(MID(A1,FIND("://",A1)+3,99),A1))&"/","/",REPT(" ",99)),99)))&".",".",REPT(" ",99)),198)))=2))))," ",".")
 
Upvote 0

Forum statistics

Threads
1,215,191
Messages
6,123,553
Members
449,108
Latest member
rache47

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top