Pulling data points from multiple webpages

Jimmywinvests

New Member
Joined
Feb 15, 2016
Messages
22
Hello Exel-sperts (yes i love bad puns),

I have spent weeks reading forums and watching videos only to fail at creating the spreadsheet I set out to; I am hoping that someone on here might be able to help me.
I need to create a spreadsheet which pulls multiple data points (from a few web pages) off yahoo finance. The method however would need to include a macro button which when pressed would refresh the data, as well as work for multiple assessments.

The data I wish to extract (using the company "Telstra" as an example) include the following:


Summary Page
https://au.finance.yahoo.com/q?s=TLS.AX
Company Name
share price
Market Cap (mil)
Balance Sheet Quartly Page
https://au.finance.yahoo.com/q/bs?s=TLS.AX
Cash and Cash equilavents (of most recent quarter - i.e. left column)
Cash from short term investments (of most recent quarter - i.e. left column)
Cash Flow Annual Page
https://au.finance.yahoo.com/q/cf?s=TLS.AX&annual
Cash from operating activities (most recent annual figure)
Capital expenditures (most recent annual figure)
Key Statistics Page
https://au.finance.yahoo.com/q/ks?s=TLS.AX
shares outstanding
return on equity
revenue growth rate
Total debt
Operating
Forward PE Ratio (optional - include if easy)

<tbody>
</tbody>


The idea is that this data is scraped off the above websites using the ticker "TLS.AX" as the identifier. Therefore, the headers should populate the first row in the spreadsheet.

The second part to this is that I would need this to work on multiple companies at a time (for example if i listed the circa 2100 companies listed on the ASX in column "A", ideally it would populate all the above listed data for every company; thus allowing the filtering of companies by data point.

Any help is very greatly appreciated. (I am not sure where else to turn to - at this point I even have doubts that excel can fulfill the requirements).

Thank-you to anybody who takes the time to read my problem!

- Jimmy
 
Looks like everything is working beautifully! - aside from it randomly stopping at different increments.

Earlier where you said to change the application to from 2 to 3 seconds... Should I do this wherever it occurs throughout the macro?

thanks again for everything. wish I could send you a gift basket
 
Upvote 0

Excel Facts

Square and cube roots
The =SQRT(25) is a square root. For a cube root, use =125^(1/3). For a fourth root, use =625^(1/4).
I'm not sure why it randomly stops unless its just not allowing enough time to download everything or if the webpage is taking a little bit longer to respond. That is beyond me. You could start a new post and ask for any help speeding up the macro and ask why it randomly stops. Other than that I'm really not sure. Glad you like it.
Oh, yes everywhere it says application.wait....2, change the 2 to a 3 and see if that helps.

Mike
 
Upvote 0
I also made a sheet to try and predict the probability of a stock being profitable using the historical data. You may find this useful or you may not care for it at all. The one I did, has an accuracy of 83%. I'll post several parts of how to set up the sheet followed by macros, and a function. Compare column G (actual) to column M (predicted).


Excel 2003
ABCDEFG
1OpenHighLowVolumeCloseProfitProfit
22.452.452.18335,2002.300
32.412.412.3582,6002.400
42.42.462.31111,7002.400
52.432.482.4108,4002.400
62.422.452.41113,6002.4311
72.392.432.35118,8002.3900
82.392.42.361,3002.300
92.322.432.380,6002.3711
102.42.42.25142,6002.2800
112.242.332.24278,1002.3311
122.052.252.0560,7002.211
132.012.041.9975,9002.0411
142.012.041.9935,0002.0411
152.022.072116,3002.0100
162.052.06260,2002.0300
172.142.152.04137,9002.100
182.062.112.0439,2002.1111
192.042.052.0149,4002.0511
202.052.082.0345,5002.0611
212.062.082.0365,1002.0811
222.072.072.02491,1002.0500
232.082.092.0262,1002.0700
242.082.082.0214,0002.0800
252.082.082.0235,4002.0800
262.082.082.0802.0800
272.082.112.05119,6002.0800
282.012.071.98143,1002.0511
292.062.081.9540,9001.9600
301.982.031.9883,300211
311.961.981.9615,1001.9811
321.981.981.9355,4001.9300
3322.152157,3002.0211
341.9821.9865,500211
351.981.981.9733,9001.9800
361.961.981.9422,1001.9811
371.941.961.9128,3001.9611
381.911.971.8990,1001.9511
391.961.981.9565,1001.9600
401.991.991.9543,6001.9600
411.971.981.9588,8001.9600
421.931.971.93114,2001.9711
431.931.931.9301.9300
441.931.931.8831,8001.9300
451.921.931.945,9001.9311
461.91.961.8795,5001.9211
471.91.91.901.900
481.91.91.901.900
491.91.91.8645,7001.900
501.891.911.8886,9001.911
511.851.881.8566,8001.8811
521.851.851.8588,6001.8500
531.851.881.8572,2001.8500
541.891.891.83159,1001.8500
551.841.881.8466,8001.8400
561.831.861.8108,8001.8511
571.841.841.79225,5001.8300
581.81.841.878,1001.8211
591.781.851.78351,1001.811
601.711.941.71214,6001.7611
611.621.651.61138,0001.6411
621.591.631.59193,0001.6111
631.621.631.59210,5001.600
641.651.651.59330,1001.6300
651.681.681.59260,8001.6500
661.671.691.63161,4001.6911
671.681.71.6565,2001.6500
Sheet1



Excel 2003
IJKLM
1LogitExp(Logit)P(X)P(Y)Predict Profit
2-11.73298.02521E-068.02515E-060.9999920
3-4.287240.0137427450.0135564420.9864440
4-0.803790.4476312310.309216340.6907840
50.2170091.2423557060.5540404240.445961
6-0.649350.522384140.3431355640.3431360
7-0.754930.4700447190.3197485850.6802510
8-4.330340.013163020.0129920060.9870080
95.420524225.99743660.9955946640.9955951
10-7.231250.0007236160.0007230930.9992770
114.09326359.935123710.9835891040.9835891
1213.61232816120.27770.9999987750.9999991
130.3784711.4600503340.5935042520.5935041
140.5202351.6824223430.6272026280.6272031
151.7013155.4811483330.8457063550.1542941
16-2.098670.122619270.1092260510.8907740
17-4.168710.0154721450.0152364050.9847640
181.8510476.3664820910.8642499920.864251
19-1.417320.2423616260.1950813840.1950810
200.3663361.4424401240.5905733820.5905731
21-0.784370.4564075460.3133790040.3133790
22-4.397090.0123130780.012163310.9878370
23-2.500650.0820313910.0758123950.9241880
24-3.080060.0459565530.0439373440.9560630
25-3.154230.0426710960.040924790.9590750
26-1.187920.3048537330.2336305790.7663690
27-0.28590.7513386460.4290082030.5709920
282.0766567.9777462230.888613470.8886131
29-3.158640.0424836370.040752330.9592480
302.54773712.778155940.9274213470.9274211
310.6045021.8303404890.6466856180.6466861
32-2.622530.0726191610.0677026510.9322970
339.69374516215.866770.9999383360.9999381
340.3710581.4492673280.5917146370.5917151
35-1.318930.2674208220.2109960770.7890040
36-0.03430.9662840340.4914264760.4914261
37-0.958180.3835894520.2772422490.2772420
382.86138917.485792060.9459044010.9459041
390.1239281.1319347880.5309425010.4690571
40-2.303730.0998851610.0908141730.9091860
41-1.040990.3531056410.2609594040.7390410
421.841396.3052942780.8631129750.8631131
43-0.747280.473652180.3214138220.6785860
44-2.393850.0912779530.0836431750.9163570
45-0.745410.4745388980.3218218920.3218220
462.56477912.997790910.9285601560.928561
47-0.659150.5172888830.3409297260.659070
48-0.659150.5172888830.3409297260.659070
49-2.046630.1291696960.114393520.8856060
500.2541.2891715590.5631607440.5631611
511.4945664.457403540.816762680.8167631
52-0.819370.4407090570.3058973320.6941030
531.4758494.3747503740.8139448480.1860551
54-3.024850.0485653350.0463159830.9536840
552.2700689.6800564830.9063675360.0936321
560.4859371.6256976760.6191488420.6191491
57-2.800850.0607586430.057278480.9427220
582.34840510.468859120.9128073690.9128071
593.69928640.418438930.9758561640.9758561
6016.3160512189014.950.99999991811
611.6161615.0337293290.8342650220.8342651
622.56704713.027299890.9287104430.928711
63-0.741920.4761993510.3225847180.6774150
64-2.912520.0543384350.0515379440.9484620
65-3.682260.0251661440.0245483560.9754520
66-0.279760.7559678370.4305134870.4305131
670.3315751.3931608930.5821425950.4178571
Sheet1
Cell Formulas
RangeFormula
I2=$O$1+$O$2*A2+$O$3*B2+$O$4*C2+$O$5*D2
J2=EXP(I2)
K2=J2/(1+J2)
L2=IF(G2=1,K2,1-K2)



Excel 2003
NOPQR
1Constant4.922300843
2A-108.2769669
3B74.61252287
4C30.72683636
5D-3.46611E-06
6
7Product5.2995E-12
8LL-25.96340904
9In RangeMinMax
10Open1.68Yes1.592.45
11High1.7Yes1.632.48
12Low1.65Yes1.592.41
13Volume65200Yes0491100
14Logit0.331575189
15e^Logit1.393160893
16
17Probability58.21%
18Classify1
19
Sheet1
Cell Formulas
RangeFormula
O1=T2
O2=U2
O3=V2
O4=W2
O5=X2
O7=PRODUCT(L:L)
O8=LN(O7)
O14=O1+O2*O10+O3*O11+O4*O12+O5*O13
O15=EXP(O14)
O17=O15/(1+O15)
O18=IF(O17>=T26,1,0)
P10=IF(AND(O10>=Q10, O10<=R10),"Yes","No")
P11=IF(AND(O11>=Q11, O11<=R11),"Yes","No")
P12=IF(AND(O12>=Q12, O12<=R12),"Yes","No")
P13=IF(AND(O13>=Q13, O13<=R13),"Yes","No")
Q10=MIN(A:A)
Q11=MIN(B:B)
Q12=MIN(C:C)
Q13=MIN(D:D)
R10=MAX(A:A)
R11=MAX(B:B)
R12=MAX(C:C)
R13=MAX(D:D)



Excel 2003
STUVWX
1 Beta0Beta1Beta2Beta3Beta4
2Coeff4.922300843-108.27774.6125230.72684-3.46611E-06
3SE(Beta)4.09993864228.3470523.543918.504537.13414E-06
4z-stat1.200579148-3.8196913.1690811.660504-0.485847492
5p-value22.99%0.01%0.15%9.68%62.71%
6McFaddenR242.91%
7Cox&SnellR244.64%
8Iterations6
9LR39.02240141
10LR p-value6.89262E-08
11xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
12ActualResponse
13PredictionPositiveNegative
14Positive256
15Negative530
16xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
17Accuracy83.33%
18ErrorRate16.67%
19HitRate83.33%
20TrueNegRate83.33%
21FalsePos16.67%
22Precision80.65%
23NegPredVal85.71%
24FalseDiscover19.35%
25xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
26CutOff43.00%
27Accuracy83.33%
Sheet1
Cell Formulas
RangeFormula
T26=INDEX(S29:S119,MATCH(MAX(T29:T119),T29:T119,0))
T27=(T14+U15)/(T14+T15+U14+U15)
S1:X25{=logit(G2:G67,A2:D67,T26,TRUE,TRUE)}
Press CTRL+SHIFT+ENTER to enter array formulas.


Code:
Sub Maximize_CutOff_Value_Logistic_Regression_Example()
Dim i As Single, RowNumber As Integer
RowNumber = 29
Application.ScreenUpdating = False
With ActiveSheet
    For i = 0.1 To 1 Step 0.01
    .Cells(26, 20) = i
    .Cells(RowNumber, 19) = i
    .Cells(RowNumber, 20) = .Cells(27, 20)
    RowNumber = RowNumber + 1
Next i
.Cells(26, 20).Value = "=Index(S29:S119,Match(Max(T29:T119),T29:T119,0))"
End With
Application.ScreenUpdating = True
End Sub

Code:
Sub Predict_Y()
With ActiveSheet
Application.ScreenUpdating = False
lastRow = Range("A" & Rows.Count).End(xlUp).Row
For R = 2 To lastRow
    Range("O10").Value = Range("A" & R).Value
    Range("O11").Value = Range("B" & R).Value
    Range("O12").Value = Range("C" & R).Value
    Range("O13").Value = Range("D" & R).Value
    Range("M" & R).Value = Range("O18").Value
Next R
End With
Application.ScreenUpdating = True
End Sub

Function:
Code:
Option Explicit
Option Base 1

Function Logit(known_y As Range, known_x As Range, Cutoff As Double, Optional Constant As Boolean = True, Optional Stats = False)
    Dim Intercept As Double
    If Constant = True Then
     Intercept = 1
    ElseIf Constant = False Then
     Intercept = 0
    End If

Dim M As Integer:  M = known_x.Columns.Count + Intercept ' number of independent variables
Dim N As Integer:  N = known_x.Rows.Count ' number of observations
Dim IndVar() As Double:  ReDim IndVar(1 To N, 1 To M) As Double
Dim ii As Integer:  ii = 1
Dim jj As Integer:  jj = 1
Dim kk As Integer:  kk = 1
Dim y_bar As Double:  y_bar = 0

For ii = 1 To N
    y_bar = y_bar + known_y(ii) ' calculate the average y
    If Intercept = 1 Then
     IndVar(ii, 1) = 1 ' create intercept
    End If
    For jj = 1 + Intercept To M
     IndVar(ii, jj) = known_x(ii, jj - Intercept) ' load in independent variables
    Next jj
    Next ii
    y_bar = y_bar / N ' average y for calculation of r2

Dim MaxIt As Integer:   MaxIt = 100 ' max number of itertions in Newton Algorithim
Dim cc As Integer:  cc = 1 ' main loop counter
Dim epsilon As Double:  epsilon = 0.000001 ' convergence criteriaq of Newton Algorithim
Dim Err As Double:  Err = 1 ' measure of convergence of Newton Algorithim
Dim y_hats() As Double: ReDim y_hats(1 To N) As Double ' model forecast
Dim Betas() As Double:  ReDim Betas(1 To M) As Double ' estimated betas
Dim z() As Double:  ReDim z(1 To N) As Double
Dim J() As Double:  ReDim J(1 To M) As Double ' Jacobian
Dim H() As Double:  ReDim H(1 To M, 1 To M) As Double ' Hessian
Dim Newt() As Variant:  ReDim Newt(1 To M) As Variant ' Newton Gain
Dim LogLikelihood As Double:    LogLikelihood = 1
Dim LogLikelihoodP As Double:   LogLikelihoodP = 1

Do While cc < MaxIt
 For ii = 1 To N
  For jj = 1 To M
    z(ii) = z(ii) + Betas(jj) * IndVar(ii, jj)
Next jj
y_hats(ii) = 1 / (1 + Exp(-1 * z(ii))) ' model estimate
For jj = 1 To M
 J(jj) = J(jj) + (known_y(ii) - y_hats(ii)) * IndVar(ii, jj)  ' Jacobian
  For kk = 1 To M
   H(jj, kk) = H(jj, kk) - y_hats(ii) * (1 - y_hats(ii)) * IndVar(ii, jj) * IndVar(ii, kk) ' Hessian
Next kk
Next jj
LogLikelihood = LogLikelihood + (known_y(ii) * Log(y_hats(ii)) + (1 - known_y(ii)) * Log(1 - y_hats(ii)))
Next ii

If Abs(LogLikelihood - LogLikelihoodP) < epsilon Then Exit Do ' check if converged, exit if true
LogLikelihoodP = LogLikelihood

Newt = Application.WorksheetFunction.MMult(J, Application.WorksheetFunction.MInverse(H))
For jj = 1 To M
 Betas(jj) = Betas(jj) - Newt(jj)
Next jj

ReDim J(1 To M): ReDim H(1 To M, 1 To M): ReDim z(1 To N) As Double ' clear Jacobian and Hessian matrix
LogLikelihood = 0
cc = cc + 1
Loop

If Stats = False Then ' if stats not selected the output betas and labels
  ReDim Output(1 To 2, 1 To M) As Variant
    For ii = 1 To M
      Output(1, ii) = "Beta" & (ii - 1)
      Output(2, ii) = Betas(ii)
    Next ii
  Logit = Output
  Exit Function
End If

Dim HInv() As Variant: ReDim HInv(1 To M, 1 To M) As Variant
Dim Tstat() As Double: ReDim Tstat(1 To M) As Double
HInv = Application.WorksheetFunction.MInverse(H)
ReDim Output(1 To 25, 1 To M + 1) As Variant
For ii = 1 To M
  Output(1, ii + 1) = "Beta" & (ii - 1) ' label
  Output(2, ii + 1) = Betas(ii) ' betas
  Output(3, ii + 1) = Sqr(-HInv(ii, ii)) ' standard errors
  Output(4, ii + 1) = Output(2, ii + 1) / Output(3, ii + 1)
  Output(5, ii + 1) = (1 - Application.WorksheetFunction.NormSDist(Abs(Output(4, ii + 1)))) * 2 ' p-value
Next ii
  Output(1, 1) = ""
  Output(2, 1) = "Coeff"
  Output(3, 1) = "SE(Beta)"
  Output(4, 1) = "z-stat"
  Output(5, 1) = "p-value"

Dim LogLikelihood0 As Double: LogLikelihood0 = N * (y_bar * Log(y_bar) + (1 - y_bar) * Log(1 - y_bar))
Output(6, 1) = "McFaddenR2":    Output(6, 2) = 1 - LogLikelihood / LogLikelihood0
Output(7, 1) = "Cox&SnellR2":   Output(7, 2) = 1 - Exp(-2 / N * (LogLikelihood - LogLikelihood0))
Output(8, 1) = "Iterations":    Output(8, 2) = cc - 1
Output(9, 1) = "LR":        Output(9, 2) = 2 * (LogLikelihood - LogLikelihood0)
Output(10, 1) = "LR p-value":   Output(10, 2) = Application.WorksheetFunction.ChiDist(Output(9, 2), M - 1) ' p-value for LR

' Contingency table
Dim N_TP As Integer:  Dim N_TN As Integer
Dim N_FP As Integer:  Dim N_FN As Integer

For ii = 1 To N
  If known_y(ii) = 1 And (y_hats(ii) - Cutoff) > 0 Then
    N_TP = N_TP + 1
  ElseIf known_y(ii) = 0 And (y_hats(ii) - Cutoff) <= 0 Then
    N_TN = N_TN + 1
  ElseIf known_y(ii) = 0 And (y_hats(ii) - Cutoff) > 0 Then
    N_FP = N_FP + 1
  ElseIf known_y(ii) = 1 And (y_hats(ii) - Cutoff) <= 0 Then
    N_FN = N_FN + 1
  End If
Next ii

Output(12, 1) = "": Output(12, 2) = "ActualResponse":   Output(13, 1) = "Prediction"
Output(13, 2) = "Positive": Output(13, 3) = "Negative"
Output(14, 1) = "Positive": Output(15, 1) = "Negative"
Output(14, 2) = N_TP:       Output(14, 3) = N_FP
Output(15, 2) = N_FN:       Output(15, 3) = N_TN

Output(17, 1) = "Accuracy": Output(17, 2) = (N_TP + N_TN) / (N_TP + N_TN + N_FP + N_FN)
Output(18, 1) = "ErrorRate":    Output(18, 2) = 1 - Output(17, 2)
Output(19, 1) = "HitRate":  Output(19, 2) = N_TP / (N_TP + N_FN)
Output(20, 1) = "TrueNegRate":  Output(20, 2) = N_TN / (N_TN + N_FP)
Output(21, 1) = "FalsePos": Output(21, 2) = 1 - Output(20, 2)

Output(22, 1) = "Precision":        If N_TP + N_FP = 0 Then Output(22, 2) = "Error" Else Output(22, 2) = N_TP / (N_TP + N_FP)
Output(23, 1) = "NegPredVal":       If N_TN + N_FP = 0 Then Output(23, 2) = "Error" Else Output(23, 2) = N_TN / (N_TN + N_FN)
Output(24, 1) = "FalseDiscover":    If N_FP + N_TP = 0 Then Output(24, 2) = "Error" Else Output(24, 2) = N_FP / (N_FP + N_TP)

For ii = 1 To M + 1
  Output(11, ii) = "xxxxx": Output(16, ii) = "xxxxx":   Output(25, ii) = "xxxxx"
Next ii
For ii = 3 To M + 1
    Output(6, ii) = "": Output(7, ii) = "": Output(8, ii) = ""
    Output(9, ii) = "": Output(10, ii) = "":    Output(12, ii) = ""
    Output(17, ii) = "":    Output(18, ii) = "":    Output(19, ii) = ""
    Output(20, ii) = "":    Output(21, ii) = "":    Output(22, ii) = ""
    Output(23, ii) = "":    Output(24, ii) = "":
Next ii
For ii = 4 To M + 1
    Output(13, ii) = "":    Output(14, ii) = "":    Output(15, ii) = ""
Next ii
Logit = Output

End Function

Hope you like.

Mike
 
Upvote 0
I was not sure if you wanted to add a "probability of profit" column to the final summary sheet. Also, do not put too much "stock", no pun intended, into the above logistic regression. Let me know what you think.
 
Upvote 0
I'm thinking this might work well as a second spreadsheet. Additionally I think I will have a go at copy pasting the current price sections of the macro into its own spreadsheet (as this value changes much more frequently and it seems near impossible to run the whole macro everyday!
 
Upvote 0

Forum statistics

Threads
1,214,632
Messages
6,120,649
Members
448,975
Latest member
sweeberry

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top