Need help using VBA to login and then download files from a website

Lighthouse6

New Member
Joined
Jun 27, 2013
Messages
11
Hi guys, I've been going around in circles for days trying to solve this problem.

I am trying to login to a site and download a file using an InternetExplorer object with VBA. The problem is that once the code clicks on the hyperlink I get a prompt from Internet Explorer (version 10 in my case): "Do you want to open or save "file.xls"? What do I do now?

Here is what I have tried:

1. URLDownloadToFile. This doesn't work because the file is not accessible without logging in.

2. Disabling the open/save prompt in Internet Explorer. Also no go. This prompt is designed as a security "feature" and is designed not to be disabled. Some people have mentioned that manually putting in registry keys could do the trick, but it has not worked for me. (Maybe I am not doing the right registry keys)?

3. Is there an download manager add in for Internet Explorer that disables the prompt? I have not found one yet.

4. Downgrading Internet Explorer from version 10 to version 7. (I'm using Windows 7, and I don't even know if it's compatible).

5. SendKeys ("%o") '(Alt + o). This should 'click' Open file. Why not use it? Because it's not reliable. The user sometimes switches the active window away. (Is there a way to "SendKeys" to a specific window?)

6. Repeating SendKeys if the file didn't open the first time.

7. Using WinHttpRequest instead of InternetExplorer. This should probably work, but it seems like it is a lot of work to learn how to use the WinHttpRequest and to rewrite all of my code. I am pretty new at vba altogether and it took me a few weeks to get this far. Besides, debugging WinHttpRequest would seem to be much more difficult because I cannot see what's happening (correct me if I'm wrong on that point). When using InternetExplorer I can watch what is happening.

8. iMacros. Theoretically, once the code enters the fields in the web site it can call iMacros to download the file. The problem with iMacros is that it takes time to load and close because it would have to load FireFox with the iMacro, then load the Web site and then close Firefox. For each one of hundreds of files!

Help! Thank you.
 
There is a button on one page of the site which seems to execute some Java code when clicked. Is there a way to to this using WinHttpRequest?
 
Upvote 0

Excel Facts

How to total the visible cells?
From the first blank cell below a filtered data set, press Alt+=. Instead of SUM, you will get SUBTOTAL(9,)
No, not using WinHttpRequest itself. You normally put the response from the WinHttpRequest object in a HTMLDocument:
Code:
Dim HTMLdoc as HTMLDocument
Set HTMLdoc = New HTMLDocument
HTMLdoc.body.innerHTML = WinHttpRequestObject.responseText
Then find the button and click it using HTMLDocument and other HTML element methods such as getElementsByTagName getElementById, Click, etc. But this might not work if the web page uses the < body o n l o a d> tag, dynamic HTML and/or Java to load or render parts of the page, which would be handled automatically if the web page was displayed by a browser.

I received your PM. Yes, the login security for the site is slightly insecure because the username and password are sent in the HTTPS URL query string (the part which starts ?LoginUsername=). This means the username and password can appear in client side logs and the browser cache - see ssl - Is an HTTPS query string secure? - Stack Overflow. Apart from that I see no harm in posting the URL in this thread because the site is HTTPS.
 
Upvote 0
Due to the HTML rendering problem explained in my previous post, I couldn't get your code to work. The answer to your question:
Code:
    ' Run the backtest
    ' oHttp.Open ?, ???????, False
is:
Code:
    ' Run the backtest
    oHttp.Open "GET", oHttp.GetResponseHeader("Location"), False
which uses the URL from the Location header from the previous GET request's 302 redirect response. However, because the WinHttp ResponseText HTML includes a < body o n l o a d ="location='https://www.xxxx'"> tag, which loads the rest of the page when displayed in a browser, the HTML for the button to be clicked is missing.

I therefore took a different approach using HTMLDocument.CreateDocumentFromUrl to download the 2 .xls files. In the VBA project you must set references to Microsoft HTML Object Library and Microsoft XML v6.0, otherwise the following code won't compile or run.

Code:
Option Explicit

Private Declare Sub Sleep Lib "kernel32" (ByVal dwMilliseconds As Long)

Public Sub CDFU_Download_Files()

    'Requires reference to MS HTML Object Library and MS XML v6.0
    
    Dim HTMLdoc As HTMLDocument, HTMLdoc2 As HTMLDocument
    Dim BackTestLink As HTMLLinkElement
    Dim downloadLinks(1) As String, n As Integer, i As Long
    Dim URL As String
    Dim username As String, password As String
    
    username = "XXXXXX"
    password = "YYYYYY"
    URL = "https://www.ZZZZZZ.com/login.jsp?LoginUsername=" & Escape(username) & "&LoginPassword=" & Escape(password) & "&Login="

    Set HTMLdoc2 = New HTMLDocument
    Set HTMLdoc = HTMLdoc2.createDocumentFromUrl(URL, "")
    While HTMLdoc.readyState <> "complete": DoEvents: Wend
    
    'Load required data page
    
    URL = "https://www.ZZZZZZ.com/app/screen/summary/11012?st=3"
    Set HTMLdoc = HTMLdoc.createDocumentFromUrl(URL, "")
    While HTMLdoc.readyState <> "complete": DoEvents: Wend
    
    'Find and click the 'Run Backtest' button
    
    Set BackTestLink = HTMLdoc.getElementById("runBacktest")
    BackTestLink.Click
    While HTMLdoc.readyState <> "complete": DoEvents: Wend
    
    'Wait until the 2 download links exist - Statistics and Backtest results
    
    n = 0
    Do
        i = 0
        While i < HTMLdoc.Links.Length And n < 2
            If InStr(HTMLdoc.Links(i).href, "DownloadScreenBacktestResults") > 0 Then
                downloadLinks(n) = HTMLdoc.Links(i).href
                n = n + 1
            End If
            i = i + 1
        Wend
        If n < 2 Then Sleep 200
        DoEvents
    Loop While n < 2
    
    'Download the 2 .xls files
    
    Download_File "C:\backtest_daily.xls", downloadLinks(0)      'Statistics
    Download_File "C:\backtest.xls", downloadLinks(1)            'Backtest results
    
End Sub


Private Sub Download_File(localFile As String, URL As String)

    'Download a file using XMLhttp
    
    Dim XMLreq As MSXML2.XMLHTTP
    Dim fileNum As Integer, bytes() As Byte
    
    Set XMLreq = New MSXML2.XMLHTTP
    With XMLreq
        .Open "GET", URL, False
        .setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 5.1; rv:22.0) Gecko/20100101 Firefox/22.0"
        .send
    
        If .Status = 200 Then
        
            'Successful response, so save response bytes in the local file
        
            fileNum = FreeFile
            Open localFile For Binary Access Write As #fileNum
            bytes = .ResponseBody
            Put #fileNum, , bytes
            Close #fileNum
            
            Debug.Print "Downloaded " & localFile & " from " & URL
            
        Else
            
            MsgBox "XMLhttp GET error " & .StatusText & vbCrLf & "Status = " & .Status & vbCrLf & "URL = " & URL
            
        End If
        
    End With

End Sub


Private Function Escape(ByVal URL As String) As String

    'URLs cannot contain most special characters.
    'VBScript and JavaScript have built-in Escape functions. In VB we have to write our own

    Dim i As Integer, BadChars As String
    BadChars = "<>%=&!@#£$^()+{[}]|\;:'"",/?"
    For i = 1 To Len(BadChars)
        URL = Replace(URL, Mid(BadChars, i, 1), "%" & Hex(Asc(Mid(BadChars, i, 1))))
    Next i
    URL = Replace(URL, " ", "+")
    Escape = URL
    
End Function
 
Upvote 0
John I have been studying your code, trying different variations to try to learn what I can. It will probably take some time to digest it especially as I am leaving for a vacation.

A few things puzzle me:
1. How is the Sub Download_File() able to access the site using a new XMLHTTP object without logging in? Does it somehow inherit the cookies from it's parent?
2. In one of your earlier posts you wrote: "XMLhttp is similar to WinHttpRequest but with less flexibility, i.e. there are certain options not available to XMLhttp". (I understood this to mean that WinHttpRequest does everything that XMLhttp does plus more). Yet you went ahead and used XMLhttp instead of WinHttpRequest. Why? Is there an inherent advantage to XMLhttp that I am missing or is it a personal preference.
3. The big puzzle is: Where did you learn so much about these objects? Seriously. I tried the MSDN web site and didn't find anything useful on this. Is there clear documentation on the objects anywhere? If not how did you become such an expert?
 
Upvote 0
Just a heads up for those of you who are following this thread:

When trying to expand John's code to do some more work I discovered that the .createDocumentFromUrl method will not work if the HTMLDocument has already loaded this URL.
 
Upvote 0
Is there a way to login (in my case I am using the site Portfolio123.com), navigate and click using an InternetExplorer object, but then use a XMLHTTP object for the download itself? This would allow the user to watch ie in action while getting around the download limitation of ie.

The reason that this is desirable is that the actions of XMLHTTP are invisible while IE can be made to be visible. Since the site can take up to 45 seconds+ to generate the data, Excel can seem to freeze when using XMLHTTP. Using IE on the other hand shows a progress bar. Another advantage of starting off using IE is that I already have 100's of lines of code using an IE object. Besides, John's CDFU_Download_Files was not designed to be called repeatedly as it logs in again each time which slows down the process.

The problem I ran into when writing the Excel VBA code was that the XMLHTTP object does not normally use the Session ID and cookies of the IE object. If there is a way to find out the SessionID in IE then it should be simple enough to use
Code:
XMLHTTP.setRequestHeader "JSessionID", ID
. However, while there is a way to get the username and generated password (using ie.Document.Cookie), I have not found a way to find out the Session ID in IE.

Instead, I tried:
Code:
Sub Test1()
    Dim ieDoc As HTMLDocument, ieDoc2 As HTMLDocument
    Dim IE As InternetExplorer
    
    ' ...
    ' Code to login and run backtest here using an InternetExplorer object
    ' ...
      
    Set ieDoc = IE.document
    Set ieDoc2 = ieDoc.createDocumentFromUrl(downloadLinks(0), "")
End Sub
but that gets me the open/save dialog within InternetExplorer which is what we are trying to avoid.

I also tried:
Code:
Sub Test2()
    Dim ieDoc As HTMLDocument
    Dim CookieName$(1 To 2), CookieValue$(1 To 2), s$
    Dim XMLreq As MSXML2.XMLHTTP
    Dim IE As InternetExplorer
    
    ' ...
    ' Code to login and run backtest here using an InternetExplorer object
    ' ...
      
    Set ieDoc = IE.document
    With XMLreq
        .Open "GET", downloadLinks(1), False
        ' ... code to extract the cookie names and values from ieDoc.Cookie ...
        .setRequestHeader CookieName(1), CookieValue(1) ' username
        .setRequestHeader CookieName(2), CookieValue(2) ' Generated Password
        .send
        Debug.Print "Bytes: " & Len(.responseBody)
    End With
End Sub
Result:
Bytes: 0

I even tried:
Code:
    ' ... use a IE object to login etc and then:
    Set ieDoc = IE.document
    Set ieDoc2 = ieDoc.createDocumentFromUrl("https://www.portfolio123.com", "")
    Download_File_Original DownloadDir & "backtest_daily.xls", Link1      'Statistics
But that didn't work either. The .responseBody was Empty in John's Download_File function for some reason even though .Status was 200 and .StatusText was "OK".
 
Upvote 0

Forum statistics

Threads
1,213,527
Messages
6,114,144
Members
448,552
Latest member
WORKINGWITHNOLEADER

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top