Remove all characters from a url except the root domain and domain extension

christojuan

New Member
Joined
Feb 19, 2013
Messages
6
Hi - This formula takes any url and removes everything that I want except that it does not remove the forward slash and all characters after that.


Code:
=IFERROR(MID(B10, FIND("^", SUBSTITUTE(B10,".","^", LEN(B10)-LEN(SUBSTITUTE(B10, ".", ""))-IF(ISNUMBER(SEARCH(OR(".co.",".com."), B10)), 2, 1)))+1, 100),B10)


The formula evaluates the items in the left column to the result in the right, but what I am looking for is for every url example in the left column to generate "woohoo.com" as a result of the formula in the right column, regardless of the top level domain extension (.com, .ru. .co, .net, .jp etc -in those cases it would generate woohoo.com, woohoo.ru, woohoo.net, woohoo.jp etc.).


http://www.bob.woohoo.com woohoo.com
http://www.bob.woohoo.com/ woohoo.com/
http://www.bill.woohoo.com woohoo.com
http://www.bill.woohoo.com/folder1 woohoo.com/folder1
http://www.bill.woohoo.com/folder1/ woohoo.com/folder1/
http://www.bill.woohoo.com/folder1/folder2 woohoo.com/folder1/folder2
http://www.bill.woohoo.com/folder1/folder2/ woohoo.com/folder1/folder2/
http://bill.woohoo.com woohoo.com
www.bill.woohoo.com woohoo.com
bill.woohoo.com woohoo.com


I would really appreciate help getting that formula to achieve that (if it is possible).


Thanks very much in advance!

 
Last edited:

Excel Facts

Why are there 1,048,576 rows in Excel?
The Excel team increased the size of the grid in 2007. There are 2^20 rows and 2^14 columns for a total of 17 billion cells.
Hi there! Welcome to the board!

If you can do a text to columns (delimited with the forward slash), you can use something like this (assuming your domain was in column C, which may include site add-ons)...

Code:
=RIGHT(C1,LEN(C1)-IF(LEN(C1)-LEN(SUBSTITUTE(C1,".",""))=1,0,FIND("|",SUBSTITUTE(C1,".","|",LEN(C1)-LEN(SUBSTITUTE(C1,".",""))-1))))

HTH
 
Upvote 0
Zach - Thanks very much for your quick reply. I tried the formula and it generates the same result as the original. I tried replacing the pipe with a forward slash but that does not remove the instances where there there is a forward slash and folders after the domain extension. can you give please give me a little more guidance on how I should change your formula to produce the desired result?

Hi there! Welcome to the board!

If you can do a text to columns (delimited with the forward slash), you can use something like this (assuming your domain was in column C, which may include site add-ons)...

Code:
=RIGHT(C1,LEN(C1)-IF(LEN(C1)-LEN(SUBSTITUTE(C1,".",""))=1,0,FIND("|",SUBSTITUTE(C1,".","|",LEN(C1)-LEN(SUBSTITUTE(C1,".",""))-1))))

HTH
 
Upvote 0
It is kind of ugly, but this formula appears to do what you asked for...

=TRIM(RIGHT(SUBSTITUTE(SUBSTITUTE(A1,"."&TRIM(RIGHT(SUBSTITUTE(A1,".",REPT(" ",99)),99)),""),".",REPT(" ",99)),99))&"."&LEFT(TRIM(RIGHT(SUBSTITUTE(A1,".",REPT(" ",99)),99)),FIND("/",TRIM(RIGHT(SUBSTITUTE(A1,".",REPT(" ",99)),99))&"/")-1)
 
Upvote 0
actually something broke :(.. the examples I provided work, but when applying to a real list it seems to be hit and miss.

Please see:
Buses | Transport for London
which becomes:
uk/modalpages/2605.aspx
whereas the one above it worked (which does not make sense to me).

Also see:
Product Tours - Constant Contact Learning Center
which becomes:
com/features/tours.jsp
wheras the Your Local Guildford, Surrey Web Design Company worked propery and become thegoodwebcompany.com

I really don't want to feature creep this request, but I think I failed to cover the full list of example urls.

I'd appreciate it if someone could help me clean this up.

[TABLE="class: t1"]
<tbody>[TR]
[TD="class: td1"]http://www.bob.woohoo.com
[/TD]
[TD="class: td1"]woohoo.com
[/TD]
[/TR]
[TR]
[TD="class: td1"]http://www.bob.woohoo.com/
[/TD]
[TD="class: td1"]woohoo.com
[/TD]
[/TR]
[TR]
[TD="class: td1"]http://www.bill.woohoo.com
[/TD]
[TD="class: td1"]woohoo.com
[/TD]
[/TR]
[TR]
[TD="class: td1"]http://www.bill.woohoo.com/folder1
[/TD]
[TD="class: td1"]woohoo.com
[/TD]
[/TR]
[TR]
[TD="class: td1"]http://www.bill.woohoo.com/folder1/
[/TD]
[TD="class: td1"]woohoo.com
[/TD]
[/TR]
[TR]
[TD="class: td1"]http://www.bill.woohoo.com/folder1/folder2
[/TD]
[TD="class: td1"]woohoo.com
[/TD]
[/TR]
[TR]
[TD="class: td1"]http://www.bill.woohoo.com/folder1/folder2/
[/TD]
[TD="class: td1"]woohoo.com
[/TD]
[/TR]
[TR]
[TD="class: td1"]http://bill.woohoo.com
[/TD]
[TD="class: td1"]woohoo.com
[/TD]
[/TR]
[TR]
[TD="class: td1"]www.bill.woohoo.com
[/TD]
[TD="class: td1"]woohoo.com
[/TD]
[/TR]
[TR]
[TD="class: td1"]bill.woohoo.com
[/TD]
[TD="class: td1"]woohoo.com
[/TD]
[/TR]
[TR]
[TD="class: td1"]http://portal.hud.gov/hudportal/HUD/states
[/TD]
[TD="class: td1"]hud.gov
[/TD]
[/TR]
[TR]
[TD="class: td1"]http://gasprices.mapquest.com/
[/TD]
[TD="class: td1"]mapquest.com
[/TD]
[/TR]
[TR]
[TD="class: td1"]http://historymatters.gmu.edu/browse/refdesk/
[/TD]
[TD="class: td1"]gmu.edu
[/TD]
[/TR]
[TR]
[TD="class: td1"]http://blast.ncbi.nlm.nih.gov/
[/TD]
[TD="class: td1"]nih.gov
[/TD]
[/TR]
[TR]
[TD="class: td1"]http://www.tfl.gov.uk/modalpages/2605.aspx
[/TD]
[TD="class: td1"]uk/modalpages/2605.aspx
[/TD]
[/TR]
[TR]
[TD="class: td1"]http://www.constantcontact.com/features/tours.jsp
[/TD]
[TD="class: td1"]com/features/tours.jsp
[/TD]
[/TR]
[TR]
[TD="class: td1"]http://www.bbc.co.uk/radio/contact/
[/TD]
[TD="class: td1"]co.uk
[/TD]
[/TR]
[TR]
[TD="class: td1"]http://www.macromedia.com/support/documentation/en/flashplayer/help/help02.html
[/TD]
[TD="class: td1"]com/support/documentation/en/flashplayer/help/help02.html
[/TD]
[/TR]
[TR]
[TD="class: td1"]http://radar.weather.gov/
[/TD]
[TD="class: td1"]weather.gov
[/TD]
[/TR]
[TR]
[TD="class: td1"]http://www.thegoodwebcompany.com/
[/TD]
[TD="class: td1"]thegoodwebcompany.com
[/TD]
[/TR]
[TR]
[TD="class: td1"]http://www.usa.gov/topics/money/insurance/tips.shtml
[/TD]
[TD="class: td1"]gov/topics/money/insurance/tips.shtml
[/TD]
[/TR]
</tbody>[/TABLE]
 
Upvote 0
Okay, give this (even uglier) formula a try...

=TRIM(RIGHT(SUBSTITUTE(TRIM(TRIM(LEFT(SUBSTITUTE(TRIM(IFERROR(MID(A1,FIND("://",A1)+3,99),A1))&"/","/",REPT(" ",99)),99))),".",REPT(" ",99)),99*(2+(LEN(TRIM(RIGHT(SUBSTITUTE(TRIM(TRIM(LEFT(SUBSTITUTE(TRIM(IFERROR(MID(A1,FIND("://",A1)+3,99),A1))&"/","/",REPT(" ",99)),99)))&".",".",REPT(" ",99)),198)))=2))))
 
Upvote 0
Wow. very close...

and I would fix this but I just cannot see where the space can be replaced by . in whoohoo.com etc. I've done some relatively involved formulas in the past, but your stuff is way, way far beyond what i can translate. Can you help me with this last piece?

[TABLE="class: t1"]
<tbody>[TR]
[TD="class: td1"]http://www.bob.woohoo.com
[/TD]
[TD="class: td1"]woohoo com
[/TD]
[/TR]
[TR]
[TD="class: td1"]http://www.bob.woohoo.com/
[/TD]
[TD="class: td1"]woohoo com
[/TD]
[/TR]
[TR]
[TD="class: td1"]http://www.bill.woohoo.com
[/TD]
[TD="class: td1"]woohoo com
[/TD]
[/TR]
[TR]
[TD="class: td1"]http://www.bill.woohoo.com/folder1
[/TD]
[TD="class: td1"]woohoo com
[/TD]
[/TR]
[TR]
[TD="class: td1"]http://www.bill.woohoo.com/folder1/
[/TD]
[TD="class: td1"]woohoo com
[/TD]
[/TR]
[TR]
[TD="class: td1"]http://www.bill.woohoo.com/folder1/folder2
[/TD]
[TD="class: td1"]woohoo com
[/TD]
[/TR]
[TR]
[TD="class: td1"]http://www.bill.woohoo.com/folder1/folder2/
[/TD]
[TD="class: td1"]woohoo com
[/TD]
[/TR]
[TR]
[TD="class: td1"]http://bill.woohoo.com
[/TD]
[TD="class: td1"]woohoo com
[/TD]
[/TR]
[TR]
[TD="class: td1"]www.bill.woohoo.com
[/TD]
[TD="class: td1"]woohoo com
[/TD]
[/TR]
[TR]
[TD="class: td1"]bill.woohoo.com
[/TD]
[TD="class: td1"]woohoo com
[/TD]
[/TR]
[TR]
[TD="class: td1"]http://portal.hud.gov/hudportal/HUD/states
[/TD]
[TD="class: td1"]hud gov
[/TD]
[/TR]
[TR]
[TD="class: td1"]http://gasprices.mapquest.com/
[/TD]
[TD="class: td1"]mapquest com
[/TD]
[/TR]
[TR]
[TD="class: td1"]http://historymatters.gmu.edu/browse/refdesk/
[/TD]
[TD="class: td1"]gmu edu
[/TD]
[/TR]
[TR]
[TD="class: td1"]http://blast.ncbi.nlm.nih.gov/
[/TD]
[TD="class: td1"]nih gov
[/TD]
[/TR]
[TR]
[TD="class: td1"]http://www.tfl.gov.uk/modalpages/2605.aspx
[/TD]
[TD="class: td1"]tfl gov uk
[/TD]
[/TR]
[TR]
[TD="class: td1"]http://www.constantcontact.com/features/tours.jsp
[/TD]
[TD="class: td1"]constantcontact com
[/TD]
[/TR]
</tbody>[/TABLE]

Okay, give this (even uglier) formula a try...

=TRIM(RIGHT(SUBSTITUTE(TRIM(TRIM(LEFT(SUBSTITUTE(TRIM(IFERROR(MID(A1,FIND("://",A1)+3,99),A1))&"/","/",REPT(" ",99)),99))),".",REPT(" ",99)),99*(2+(LEN(TRIM(RIGHT(SUBSTITUTE(TRIM(TRIM(LEFT(SUBSTITUTE(TRIM(IFERROR(MID(A1,FIND("://",A1)+3,99),A1))&"/","/",REPT(" ",99)),99)))&".",".",REPT(" ",99)),198)))=2))))
 
Upvote 0
Wow. very close...

and I would fix this but I just cannot see where the space can be replaced by . in whoohoo.com etc. I've done some relatively involved formulas in the past, but your stuff is way, way far beyond what i can translate. Can you help me with this last piece?

[TABLE="class: t1"]
<TBODY>[TR]
[TD="class: td1"]http://www.bob.woohoo.com
[/TD]
[TD="class: td1"]woohoo com
[/TD]
[/TR]
[TR]
[TD="class: td1"]http://www.bob.woohoo.com/
[/TD]
[TD="class: td1"]woohoo com
[/TD]
[/TR]
[TR]
[TD="class: td1"]http://www.bill.woohoo.com
[/TD]
[TD="class: td1"]woohoo com
[/TD]
[/TR]
[TR]
[TD="class: td1"]http://www.bill.woohoo.com/folder1
[/TD]
[TD="class: td1"]woohoo com
[/TD]
[/TR]
[TR]
[TD="class: td1"]http://www.bill.woohoo.com/folder1/
[/TD]
[TD="class: td1"]woohoo com
[/TD]
[/TR]
[TR]
[TD="class: td1"]http://www.bill.woohoo.com/folder1/folder2
[/TD]
[TD="class: td1"]woohoo com
[/TD]
[/TR]
[TR]
[TD="class: td1"]http://www.bill.woohoo.com/folder1/folder2/
[/TD]
[TD="class: td1"]woohoo com
[/TD]
[/TR]
[TR]
[TD="class: td1"]http://bill.woohoo.com
[/TD]
[TD="class: td1"]woohoo com
[/TD]
[/TR]
[TR]
[TD="class: td1"]www.bill.woohoo.com
[/TD]
[TD="class: td1"]woohoo com
[/TD]
[/TR]
[TR]
[TD="class: td1"]bill.woohoo.com
[/TD]
[TD="class: td1"]woohoo com
[/TD]
[/TR]
[TR]
[TD="class: td1"]http://portal.hud.gov/hudportal/HUD/states
[/TD]
[TD="class: td1"]hud gov
[/TD]
[/TR]
[TR]
[TD="class: td1"]http://gasprices.mapquest.com/
[/TD]
[TD="class: td1"]mapquest com
[/TD]
[/TR]
[TR]
[TD="class: td1"]http://historymatters.gmu.edu/browse/refdesk/
[/TD]
[TD="class: td1"]gmu edu
[/TD]
[/TR]
[TR]
[TD="class: td1"]http://blast.ncbi.nlm.nih.gov/
[/TD]
[TD="class: td1"]nih gov
[/TD]
[/TR]
[TR]
[TD="class: td1"]http://www.tfl.gov.uk/modalpages/2605.aspx
[/TD]
[TD="class: td1"]tfl gov uk
[/TD]
[/TR]
[TR]
[TD="class: td1"]http://www.constantcontact.com/features/tours.jsp
[/TD]
[TD="class: td1"]constantcontact com
[/TD]
[/TR]
</TBODY>[/TABLE]
I didn't even notice the "dots" were gone. Okay, let's just SUBSTITUTE the spaces away...

=SUBSTITUTE(TRIM(RIGHT(SUBSTITUTE(TRIM(TRIM(LEFT(SUBSTITUTE(TRIM(IFERROR(MID(A1,FIND("://",A1)+3,99),A1))&"/","/",REPT(" ",99)),99))),".",REPT(" ",99)),99*(2+(LEN(TRIM(RIGHT(SUBSTITUTE(TRIM(TRIM(LEFT(SUBSTITUTE(TRIM(IFERROR(MID(A1,FIND("://",A1)+3,99),A1))&"/","/",REPT(" ",99)),99)))&".",".",REPT(" ",99)),198)))=2))))," ",".")
 
Upvote 0
woohoo! that is it. Rick, thank you very, very much. This is SO awesome and will save me hours of manual work. I really appreciate it.

I didn't even notice the "dots" were gone. Okay, let's just SUBSTITUTE the spaces away...

=SUBSTITUTE(TRIM(RIGHT(SUBSTITUTE(TRIM(TRIM(LEFT(SUBSTITUTE(TRIM(IFERROR(MID(A1,FIND("://",A1)+3,99),A1))&"/","/",REPT(" ",99)),99))),".",REPT(" ",99)),99*(2+(LEN(TRIM(RIGHT(SUBSTITUTE(TRIM(TRIM(LEFT(SUBSTITUTE(TRIM(IFERROR(MID(A1,FIND("://",A1)+3,99),A1))&"/","/",REPT(" ",99)),99)))&".",".",REPT(" ",99)),198)))=2))))," ",".")
 
Upvote 0

Forum statistics

Threads
1,223,098
Messages
6,170,100
Members
452,301
Latest member
QualityAssurance

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top