How to identify website address in a sentence

robertexcel

New Member
Joined
Jun 8, 2011
Messages
5
I was searching for some excel help online and have noticed this amazing forum. I hope even I will get some support here!

OK, here is the question (Tough one for me).. I have around 1000 sentences (one sentence in one row) in say column A, all these sentences have website addresses included. For E.g. in A1 the sentence is "there is a website address in this sentence and I am searching for some solution in mrexcel.com and I hope I get some reply as I have searched in www.google.com".

Notice that a website address may not start with 'WWW'. Now, is it possible in excel using formulas or some macro, to get mrexcel.com in column "B1" and www.google.com in column "C1".

You can assume that there will not be more than 10 website addresses in one sentence and also it is not necessary that a website should end with ".com". Finally, the # of rows can be a variable, as I get these sentences on a daily basis, in general # of rows will not exceed a million (I hope this is not a big number for all the programming geeks). I tried a bit on excel using formulas but I am not able to do it, can someone pleased guide me in this regard. Thanks in advance and hope I get some gr8 reply!
 

Excel Facts

Pivot Table Drill Down
Double-click any number in a pivot table to create a new report showing all detail rows that make up that number
In my opinion, the way you described the situation, you have an unsolvable problem.

You wrote:
"Notice that a website address may not start with 'WWW'."

You wrote:
"and also it is not necessary that a website should end with ".com"."

Wow. That means, many words can be a website. Example, in your sentence...
"there is a website address in this sentence and I am searching for some solution in mrexcel.com and I hope I get some reply as I have searched in www.google.com".
...there is a website www.website.com, and a website named www.address.com, and a website named www.searching.com, all words found in your example sentence, and a bottomless pit of other words that will be in sentences, not starting with "www." or ending in ".com" (or any other domain) which can be interpreted as URL's based on your far-and-wide criteria.

Yeeesh...if what you wrote really is the case, there's no unambiguous criteria, destined for failure.
 
Upvote 0
Hello Tom! thanks for the reply. Your questions tell me that you have understood my situation, but I think I am not in such a bad situation either. Probably I have forgot to mention that only those phrases can be considered as website address which have some extension. So, we need not consider "Website" or "Address" as a website name but website.org is definitely a website and address.co is also a website. Is there a list of website extensions or something like that, does that list help in solving the issue?
 
Upvote 0
Robert
Tom has it nailed, Excel is pretty **** smart, but not Psychic !!
However, you need to think about
How are the sentences constructed ? Is there only one sentence and one full stop in each cell ?
If this is the case then you may be able to extract either side of the full stop.
Where did the sentences come from, were they downloaded from say a txt, doc,....etc
or typed in by a user.
Are the "web addresses" in a different font, or Bold, or highlighted ?

If they are none of the above and simply an integral part of the sentence, then as Tom pointed out....you're destined for failure !
 
Upvote 0
Whow, I dint know that its that complicated, but you may be right. first, let me tell you how a general sentence is:

"mark uses a website called mrexcel.com and he also uses a website called wikipedia.org and I need to extract these two"

1. There may or may not be a full stop in sentence.
2. none of the text is bold or has different size or font.
3. I get a excel file which already has these sentences, so I am sorry but I dont know the actual source of these sentences. Let me know if there is a solution to this.

* Note: I think if we have a list of website extensions (I dont think there will be more than 300 to 400 different extensions) we can then compare each of them for each sentence. I also think that there will not be any "Space" in between the name of a website. So if we can find the extension from the list of extensions, we then have to get all the text which is to the left of the "Extension" until we meet space. Is this making sense?
 
Upvote 0
Ok.
There are 196 countries in the world, I think...plus all the extra extensions. So, it's safe to say in excess of 200 web extensions. Also, in your example, it could also read
Code:
mrexcel.com.uk and he also uses a website called wikipedia.org.au
which significantly adds to the problem.
You would have to loop through every sentence in every cell looking for those 200+ extensions.

Consider Italy for instance, their extension is .It.....there is endless possibilities for a sentence following a full stop starting with It...same applies for Antillies .an and American Samoa .as

Having said that, I think your request is possible, but it's going to be a bit of a brain buster for the person that creates the code !!
 
Last edited:
Upvote 0
Hello Michael,

That's a good question, but Its OK even if we get just the website name, for E.g. even if the sentence reads mrexcel.com.uk its OK if we just get "mrexcel.com" and not "mrexcel.com.uk". Does this help?
 
Upvote 0
Nope !!
My comments in post #6 remain the same...removing the last extension only makes the almost impossible,.... nearly, almost impossible !!
 
Upvote 0
Have given this a bit of thought....
Have you considered using text to columns to seperate the data into individual words, using the fixed width ( space) as the delimiter.
The webpages will then be in individual cells.
A macro can then be written to delete all cells that don't have a full stop in the cell.
Something to think about, maybe !!!
 
Upvote 0
Wow Michael, that sounds great, I first had to understand what is 'Text to Columns' and then I understood your logic, it is almost there. I will do some learning on how to write a macro for the remaining part. My only concern here is, what if there are two sentences in one row(Separated by a '.')?

Else, your logic is perfect. Thanks a Ton! If you get any ideas on my question above then please post it. I appreciate your time.
 
Upvote 0

Forum statistics

Threads
1,224,592
Messages
6,179,786
Members
452,942
Latest member
VijayNewtoExcel

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top