Automatically extract website data to a spreadsheet

Corried

Board Regular
Joined
Dec 19, 2019
Messages
217
Office Version
  1. 2019
Platform
  1. Windows
  2. Web
Hello, Thank you for sharing..

I am looking for courses on how to automatically extract data from a website to an excel spreadsheet.

The excel file contains fifty links. Each link has the same process, to create a stock market report for fifty companies. Use macro and chrome to generate data.

Can anyone show me a step by step process on how I can do this?

Please watch the video and Excel workbook below to learn more.



I hope you all understand.

Cheers
 
Concerning the Hong Kong site, the jockey table was not available when you tested, try it at a moment when it is there.

My idea using another page is to determine if the MSN site has some scraping blocking mechanism or something, because I was also getting errors sometimes.
Sorry for late reply. Do you think an addon came prevent the same from executing...
 
Upvote 0

Excel Facts

Wildcard in VLOOKUP
Use =VLOOKUP("Apple*" to find apple, Apple, or applesauce
Concerning the Hong Kong site, the jockey table was not available when you tested, try it at a moment when it is there.

My idea using another page is to determine if the MSN site has some scraping blocking mechanism or something, because I was also getting errors sometimes.
Hello Wolf. I am not sure what seems to be the problem on my end. Do you know someone who can help me with my problems?
Is there any options on your end so I can try?

Thanks in advance

Corrie
 
Upvote 0
I would like to know what happens when you run the code from post #37.
The page should look as below:

hk.png
 
Upvote 0
The jockey table was not available when you recorded the video, but is there right now.
Can you try again at this moment?

Maybe we should test with another site, but for me it is important that it is not the MSN site, I want to separate the problems.
 
Upvote 0
The jockey table was not available when you recorded the video, but is there right now.
Can you try again at this moment?

Maybe we should test with another site, but for me it is important that it is not the MSN site, I want to separate the problems.
What would you recommend I test. I did not use the MSN site but Chrome
 
Upvote 0
The jockey table was not available when you recorded the video, but is there right now.
Can you try again at this moment?

Maybe we should test with another site, but for me it is important that it is not the MSN site, I want to separate the problems.
I think I got something... Completed.webm
 
Upvote 0
All right, you seem to have a functional Selenium installation.

It is time to go back to the MSN Money site.

Take a look below at a line from post #29 and the code I am testing today; note that the CSS strings are different. I do not know if this is intentional to prevent scraping.

I suggest that when the code you are using cannot find a page element, you get the CSS or XPath for that element and replace it in the code to see if this solves the issue.

Do you know how to do that?

If the HTML keeps changing, we cannot have stable scraping code.

The key point will be to confirm that the HTML is been modified, as shown below.

VBA Code:
[b2] = driver.FindElementByCss("#root > div:nth-child(1) > div > div.mainContentLayout-DS-EntryPoint1-1 > div > div > div:nth-child(1) > div > div.firstSection-DS-EntryPoint1-1 > div > div.title-DS-EntryPoint1-1 > span.displayNameWithBtn-DS-EntryPoint1-1").Text

'[b2] = driver.FindElementByCss("#root > div:nth-child(1) > div > div.mainContentLayout-DS-EntryPoint1-1 > div > div > div:nth-child(1) > div > div.firstSection-DS-EntryPoint1-1 > div.quoteInfo-DS-EntryPoint1-1 > div.title-DS-EntryPoint1-1 > span.displayName-DS-EntryPoint1-1").Text
 
Upvote 0
The code below worked for me today, please try it with BTG.

If you get an error, tell me on what line, I want to see if the string changed.

VBA Code:
Public driver As New ChromeDriver
Sub Money29()
Dim pt As WebElement, key As New Selenium.Keys, s$
driver.get "https://www.msn.com/en-us/money/stockdetails/fi-bjqejc?duration=1D"
Application.Wait Now + TimeValue("0:00:06")
Set pt = driver.FindElementByXPath("//*[@id=""searchBox""]/input")
pt.Click
pt.SendKeys ActiveSheet.[c2]
Application.Wait Now + TimeValue("0:00:06")
DoEvents
Set pt = driver.FindElementByCss("#searchBox > span > span > svg")
pt.Click
Application.Wait Now + TimeValue("0:00:06")
DoEvents
[a2] = driver.URL
[b2] = driver.FindElementByCss("#root > div:nth-child(1) > div > div.mainContentLayout-DS-EntryPoint1-1 > div > div > div:nth-child(1) > div > div.firstSection-DS-EntryPoint1-1 > div > div.title-DS-EntryPoint1-1 > span.displayNameWithBtn-DS-EntryPoint1-1").Text

[d2] = driver.FindElementByXPath("//*[@id=""root""]/div[1]/div/div[5]/div/div/div[2]/div/div[2]/div[2]/div[8]/div[2]").Text
[e2] = driver.FindElementByCss("#root > div:nth-child(1) > div > div.mainContentLayout-DS-EntryPoint1-1 >" & _
" div > div > div:nth-child(1) > div > div.secondSection-DS-EntryPoint1-1 > div.priceInfo-DS-EntryPoint1-1 > div >" & _
" div.mainPrice.color_red-DS-EntryPoint1-1").Text
s = driver.FindElementByXPath("//*[@id=""root""]/div[1]/div/div[5]/div/div/div[2]/div/div[2]/div[1]/div[2]/div/div[3]/span[2]").Text
s = s & " / " & driver.FindElementByXPath("//*[@id=""root""]/div[1]/div/div[5]/div/div/div[2]/div/div[2]/div[1]/div[2]/div/div[3]/span[1]").Text
[h2] = s
[j2] = driver.FindElementByXPath("//*[@id=""root""]/div[1]/div/div[5]/div/div/div[2]/div/div[2]/div[2]/div[5]/div[2]").Text
' 5-year
driver.FindElementByXPath("//*[@id=""root""]/div[1]/div/div[5]/div/div/div[2]/div/div[1]/div[2]/div/div/button[8]").Click
Application.Wait Now + TimeValue("0:00:06")
[i2] = driver.FindElementByXPath("//*[@id=""root""]/div[1]/div/div[5]/div/div/div[2]/div/div[2]/div/table/tbody/tr[2]/td[12]/div").Text
' analysis
driver.FindElementByCss("#root > div:nth-child(1) > div > div.mainContentLayout-DS-EntryPoint1-1 > div > div" & _
" > div:nth-child(1) > div > div.secondSection-DS-EntryPoint1-1 > div.navigation-DS-EntryPoint1-1 > div > div > button:nth-child(5) > span").Click
Application.Wait Now + TimeValue("0:00:10")
driver.SwitchToNextWindow
s = driver.FindElementByXPath _
("//*[@id=""main""]/div[2]/div[2]/div[2]/div[3]/div/div/div[5]/div[1]/div[2]/div[1]/div[1]/div[1]/div[2]/div/div[2]/div/ul[2]/li[2]/p").Text
s = s & " / " & driver.FindElementByXPath _
("//*[@id=""main""]/div[2]/div[2]/div[2]/div[3]/div/div/div[5]/div[1]/div[2]/div[1]/div[1]/div[1]/div[2]/div/div[2]/div/ul[3]/li[2]/p").Text
[g2] = s
[L2] = driver.FindElementByXPath("//*[@id=""main""]/div[2]/div[2]/div[2]/div[3]/" & _
"div/div/div[5]/div[1]/div[2]/div[1]/div[1]/div[1]/div[2]/div/div[2]/div/ul[6]/li[2]/p").Text
[k2] = driver.FindElementByXPath _
("//*[@id=""main""]/div[2]/div[2]/div[2]/div[4]/div/div/div[5]/div[1]/div[2]/div[1]/div[1]/div/div/div/ul[1]/li[2]/span[1]/p").Text
' growth
Application.Wait Now + TimeValue("0:00:12")
Set pt = driver.FindElementByCss _
("#main > div.content-div.fullwidth.loaded > div.main-region.maincontainer.fullwidth.stckdtl" _
& " > div.dynaloadable > div:nth-child(4) > div > div > div.key-stats-area > ul > li:nth-child(2)")
pt.Click
Application.Wait Now + TimeValue("0:00:06")
[F2] = driver.FindElementByXPath("//*[@id=""main""]/div[2]/div[2]/div[2]/div[3]/" & _
"div/div/div[5]/div[1]/div[2]/div[2]/div[1]/div/div/div/ul[1]/li[2]/span[1]/p").Text
' company
driver.FindElementByXPath("//*[@id=""profile""]/a").Click
Application.Wait Now + TimeValue("0:00:06")
driver.FindElementByXPath("//*[@id=""main""]/div[2]/div[2]/div[2]/div/div[3]/div/div[2]/div[2]/div/button[1]").Click
Application.Wait Now + TimeValue("0:00:06")
[m2] = driver.FindElementByXPath("//*[@id=""main""]/div[2]/div[2]/div[2]/div/div[3]/div/div[2]/div[1]").Text
MsgBox "Success!"
End Sub
 
Upvote 0
The code below worked for me today, please try it with BTG.

If you get an error, tell me on what line, I want to see if the string changed.

VBA Code:
Public driver As New ChromeDriver
Sub Money29()
Dim pt As WebElement, key As New Selenium.Keys, s$
driver.get "https://www.msn.com/en-us/money/stockdetails/fi-bjqejc?duration=1D"
Application.Wait Now + TimeValue("0:00:06")
Set pt = driver.FindElementByXPath("//*[@id=""searchBox""]/input")
pt.Click
pt.SendKeys ActiveSheet.[c2]
Application.Wait Now + TimeValue("0:00:06")
DoEvents
Set pt = driver.FindElementByCss("#searchBox > span > span > svg")
pt.Click
Application.Wait Now + TimeValue("0:00:06")
DoEvents
[a2] = driver.URL
[b2] = driver.FindElementByCss("#root > div:nth-child(1) > div > div.mainContentLayout-DS-EntryPoint1-1 > div > div > div:nth-child(1) > div > div.firstSection-DS-EntryPoint1-1 > div > div.title-DS-EntryPoint1-1 > span.displayNameWithBtn-DS-EntryPoint1-1").Text

[d2] = driver.FindElementByXPath("//*[@id=""root""]/div[1]/div/div[5]/div/div/div[2]/div/div[2]/div[2]/div[8]/div[2]").Text
[e2] = driver.FindElementByCss("#root > div:nth-child(1) > div > div.mainContentLayout-DS-EntryPoint1-1 >" & _
" div > div > div:nth-child(1) > div > div.secondSection-DS-EntryPoint1-1 > div.priceInfo-DS-EntryPoint1-1 > div >" & _
" div.mainPrice.color_red-DS-EntryPoint1-1").Text
s = driver.FindElementByXPath("//*[@id=""root""]/div[1]/div/div[5]/div/div/div[2]/div/div[2]/div[1]/div[2]/div/div[3]/span[2]").Text
s = s & " / " & driver.FindElementByXPath("//*[@id=""root""]/div[1]/div/div[5]/div/div/div[2]/div/div[2]/div[1]/div[2]/div/div[3]/span[1]").Text
[h2] = s
[j2] = driver.FindElementByXPath("//*[@id=""root""]/div[1]/div/div[5]/div/div/div[2]/div/div[2]/div[2]/div[5]/div[2]").Text
' 5-year
driver.FindElementByXPath("//*[@id=""root""]/div[1]/div/div[5]/div/div/div[2]/div/div[1]/div[2]/div/div/button[8]").Click
Application.Wait Now + TimeValue("0:00:06")
[i2] = driver.FindElementByXPath("//*[@id=""root""]/div[1]/div/div[5]/div/div/div[2]/div/div[2]/div/table/tbody/tr[2]/td[12]/div").Text
' analysis
driver.FindElementByCss("#root > div:nth-child(1) > div > div.mainContentLayout-DS-EntryPoint1-1 > div > div" & _
" > div:nth-child(1) > div > div.secondSection-DS-EntryPoint1-1 > div.navigation-DS-EntryPoint1-1 > div > div > button:nth-child(5) > span").Click
Application.Wait Now + TimeValue("0:00:10")
driver.SwitchToNextWindow
s = driver.FindElementByXPath _
("//*[@id=""main""]/div[2]/div[2]/div[2]/div[3]/div/div/div[5]/div[1]/div[2]/div[1]/div[1]/div[1]/div[2]/div/div[2]/div/ul[2]/li[2]/p").Text
s = s & " / " & driver.FindElementByXPath _
("//*[@id=""main""]/div[2]/div[2]/div[2]/div[3]/div/div/div[5]/div[1]/div[2]/div[1]/div[1]/div[1]/div[2]/div/div[2]/div/ul[3]/li[2]/p").Text
[g2] = s
[L2] = driver.FindElementByXPath("//*[@id=""main""]/div[2]/div[2]/div[2]/div[3]/" & _
"div/div/div[5]/div[1]/div[2]/div[1]/div[1]/div[1]/div[2]/div/div[2]/div/ul[6]/li[2]/p").Text
[k2] = driver.FindElementByXPath _
("//*[@id=""main""]/div[2]/div[2]/div[2]/div[4]/div/div/div[5]/div[1]/div[2]/div[1]/div[1]/div/div/div/ul[1]/li[2]/span[1]/p").Text
' growth
Application.Wait Now + TimeValue("0:00:12")
Set pt = driver.FindElementByCss _
("#main > div.content-div.fullwidth.loaded > div.main-region.maincontainer.fullwidth.stckdtl" _
& " > div.dynaloadable > div:nth-child(4) > div > div > div.key-stats-area > ul > li:nth-child(2)")
pt.Click
Application.Wait Now + TimeValue("0:00:06")
[F2] = driver.FindElementByXPath("//*[@id=""main""]/div[2]/div[2]/div[2]/div[3]/" & _
"div/div/div[5]/div[1]/div[2]/div[2]/div[1]/div/div/div/ul[1]/li[2]/span[1]/p").Text
' company
driver.FindElementByXPath("//*[@id=""profile""]/a").Click
Application.Wait Now + TimeValue("0:00:06")
driver.FindElementByXPath("//*[@id=""main""]/div[2]/div[2]/div[2]/div/div[3]/div/div[2]/div[2]/div/button[1]").Click
Application.Wait Now + TimeValue("0:00:06")
[m2] = driver.FindElementByXPath("//*[@id=""main""]/div[2]/div[2]/div[2]/div/div[3]/div/div[2]/div[1]").Text
MsgBox "Success!"
End Sub
The Code broke. I am sorry I could not chat with you earlier.
please view link with the same error as before

 
Upvote 0

Forum statistics

Threads
1,214,782
Messages
6,121,532
Members
449,037
Latest member
tmmotairi

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top