Data extraction of multiple websites

extactor

New Member
Joined
Dec 9, 2022
Messages
7
Office Version
  1. 2019
Platform
  1. Windows
Hey guys,

I am looking for a program that extracts URLs, locations, Instagram, Facebook, phone numbers, and emails of different hotels all around the world that I just have the name for and can get the URLs myself if I need to. All hotels have different websites, would it be possible to extract data from all of them with just one program without writing code for each one?

Thanks for the help!
 

Excel Facts

Waterfall charts in Excel?
Office 365 customers have access to Waterfall charts since late 2016. They were added to Excel 2019.
I tried using a web extractor but had to manually activate it for each website.
 
Upvote 0
I'm sure there are many here more skilled, but offhand I'd say the short answer is no. If more than one web site is using some common third party package to generate the page(s) you're interested in, you might be able to re-use the code for the first one to the next, but typically every web site will have different characteristics, and even the pages built as noted could have different names for the table/section you're working with.
My suggestion is to first create a table of the URLs. You might want to add an Index column from 0 that's incorporated into the table to keep it permanent rather than using Index in PQ.
Book3
AB
1IndexURL
20https://www.google.com
31https://www.microsoft.com
42https://www.yahoo.com
Sheet1
Cell Formulas
RangeFormula
A3:A4A3=A2+1
Pull that into PQ:
Query name: URLs:
Power Query:
let
    Source = Excel.CurrentWorkbook(){[Name="URLs"]}[Content],
    ChangedType = Table.TransformColumnTypes(Source,{{"Index", Int64.Type}, {"URL", type text}})
in
    ChangedType
Then create a new query from the first row like this
Query name URL0:
Power Query:
let
    Source =  Web.Page(Web.Contents(URLs[URL]{0}))
in
    Source
This is a sample and only gets you started. As I said, having an Index column in the URLs table just keeps that Index number tied to the URL, and if you use a naming convention for the subsequent queries like URL0 for the first row URL, it will be easier to track. If you come up with multiple URLs that can use the same code, it's easy enough to create a function from a query and keep the main URL data in separate queries.

It's a start!
 
Upvote 0
I'm sure there are many here more skilled, but offhand I'd say the short answer is no. If more than one web site is using some common third party package to generate the page(s) you're interested in, you might be able to re-use the code for the first one to the next, but typically every web site will have different characteristics, and even the pages built as noted could have different names for the table/section you're working with.
My suggestion is to first create a table of the URLs. You might want to add an Index column from 0 that's incorporated into the table to keep it permanent rather than using Index in PQ.
Book3
AB
1IndexURL
20https://www.google.com
31https://www.microsoft.com
42https://www.yahoo.com
Sheet1
Cell Formulas
RangeFormula
A3:A4A3=A2+1
Pull that into PQ:
Query name: URLs:
Power Query:
let
    Source = Excel.CurrentWorkbook(){[Name="URLs"]}[Content],
    ChangedType = Table.TransformColumnTypes(Source,{{"Index", Int64.Type}, {"URL", type text}})
in
    ChangedType
Then create a new query from the first row like this
Query name URL0:
Power Query:
let
    Source =  Web.Page(Web.Contents(URLs[URL]{0}))
in
    Source
This is a sample and only gets you started. As I said, having an Index column in the URLs table just keeps that Index number tied to the URL, and if you use a naming convention for the subsequent queries like URL0 for the first row URL, it will be easier to track. If you come up with multiple URLs that can use the same code, it's easy enough to create a function from a query and keep the main URL data in separate queries.

It's a start!
You are certainly more skilled than I am! I had some hope but knew it was a difficult task given the multiple websites with different formatting and location of data. Anyways, it is for sure a good start but could you help me understand what your code does, I am still learning! From what I know it would deliver all of the website's content and probably in raw HTML format correct?

Thanks for the help!
 
Upvote 0
You are certainly more skilled than I am! I had some hope but knew it was a difficult task given the multiple websites with different formatting and location of data. Anyways, it is for sure a good start but could you help me understand what your code does, I am still learning! From what I know it would deliver all of the website's content and probably in raw HTML format correct?

Thanks for the hel
Give me a URL. What you get varies immensely, but a sample URL is a start.
 
Upvote 0
for example: Berlin
If that's where you're looking to pull data from, Excel is not the tool you should be using - with or without VBA. As noted, Power Automate is the only Microsoft Product I'm aware of that MIGHT be able to tackle the job, but I suspect you'd need some pretty sophisticated web scraping tools to get there.
To see if you can get Power Automate, go to office.com, log in with your Microsoft credentials. It should be in the Apps, but will only be Web based, BUT can save data to Excel which can then be accessed on the Desktop through OneDrive.
Good luck!
 
Upvote 0
If that's where you're looking to pull data from, Excel is not the tool you should be using - with or without VBA. As noted, Power Automate is the only Microsoft Product I'm aware of that MIGHT be able to tackle the job, but I suspect you'd need some pretty sophisticated web scraping tools to get there.
To see if you can get Power Automate, go to office.com, log in with your Microsoft credentials. It should be in the Apps, but will only be Web based, BUT can save data to Excel which can then be accessed on the Desktop through OneDrive.
Good luck!
Alright, will check it out! thanks for the help
 
Upvote 0

Forum statistics

Threads
1,215,155
Messages
6,123,332
Members
449,098
Latest member
thnirmitha

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top