How to check if file exists on web

mchac

Well-known Member
Joined
Apr 15, 2013
Messages
531
I would like to be able to check if a file exists and direct VBA execution based on the result. The code I've seen so far in my search all seems to check if a file exists on a local drive. I need to check if the file exists on the web. For example if:

www.sec.gov/Archives/edgar/data/886475/000101905613000180/Financial_Report.xls

exists (which it does) then set a flag to True.

I've tried the Dir command and a couple other things I've seen but unsuccessfully.

Does anyone know of a simple way of doing this in VBA?

Thanks in advance.
 

Excel Facts

Move date out one month or year
Use =EDATE(A2,1) for one month later. Use EDATE(A2,12) for one year later.
This worked for me.
Sub CheckHttpExists(testURL, HttpExists)
Dim oXHTTP As Object
Set oXHTTP = CreateObject("MSXML2.XMLHTTP")
If Not UCase(testURL) Like "HTTP:*" Then
testURL = "http://" & testURL
End If
On Error GoTo haveError
oXHTTP.Open "HEAD", testURL, False
oXHTTP.send
HttpExists = IIf(oXHTTP.Status = 200, "OK", "Not OK")
Exit Sub
haveError:
HttpExists = "Not OK"
End Sub
 
Upvote 0
I was thinking of other things that could be useful in this ftp project pursuit and while searching google, I found that you could pull the directory list straight from the FTP server. I'm not sure if this will help with the current issue or not, but it might spark some ideas.

Take the code John came up with the other day for downloading a file. I've found if you leave off the filename, but still give it a destination file it gives you the directory listing. (referencing code from http://www.mrexcel.com/forum/excel-...e-ftp-commands-visual-basic-applications.html)

Example:

Code:
DownloadFile "ftp://anonymous@ftp.sec.gov/edgar", "C:\rj\excel\secdirlist.txt"

It isn't recursive, but will provide a directory listing in text format, which could in turn be used elsewhere in code.
 
Upvote 0
As a side note, I think I found the jackpot of everything VB and FTP. It uses a slightly different method than John's method of downloading a file, but this looks like it will do everything that ftp can do, but all through VBA. In my searches I've seen many references to this method of doing things. It requires an additional control reference, but if you're only using it yourself, that shouldn't be an issue. If others are going to use it, the reference would have to be setup on their individual machines. I haven't gotten anything to work this way yet. I'll play with it more tomorrow.

Using the Internet Transfer Control
 
Upvote 0
RJ, Thanks for the directory list command. That will come in handy. I'm finding that some companies that had 10-Q filings but didn't have financials filed with it, so there is no Financial_Report.xls to dowload. Thankfully, my VBA continued to execute even though the download command but being able to check first may save some time.

As an aside, I had trouble with John's code yesterday. It kept hanging about every 50-100 downloads and I assumed that was because the connection to the server was lost or interrupted. Based on that assumption I went back to having VBA launch an FTP session on FTP software (passive-ftp) via a Shell command. I assumed that the FTP software would have all the connection error handling that I just don't have the chops to build in VBA.

That method worked better but also hung or gave me a VB file error every 500-1000 records. After more teeth gnashing I found that the problem was that I would execute a Shell and it would take a few seconds for the FTP software to launch and dowload the file. Some of the delay i'm assuming was server or internet traffic but some was also due to the size of the file being downloaded. The next command I had after Shell was Name so I renamed the Financial_Report.xls file to something unique of the form FinRpt_"corp name"_"filing date".xls. Because VBA would execute the Shell but not wait until the 3rd party software completed it's routine (there may be some way to have them talk to each other but that's way over my head) and then execute the Name command, sometimes the file hadn't been downloaded so the Name command didn't fine the file to rename and gave up an error. So i inserted a check that loops until the Financial_Report.xls file appears on my local then I move on to the Name command.

But for the occasional niggle that I'm working through tonight, it works. If you are interested, I'll post the code when I have it sorted out. (by the way, how do I properly post code in this forum? I've just been using copy paste but I know that's not what everyone else is doing).

Now, solving the download file completion issue made me think that the problem I had with John's code was that the process his DownloadFile command kicked off wasn't done with one file before kicking off another download process. I'm also thinking that VBA can only do one file at a time and as a result it hung. I don't know any of this for sure and I don't know how to check it but once I finalize the FTP method described above I'm going to try John's code with that delay loop and time them both for grins.

Thanks for the ITC command list. I've seen some reference to this and Inet. in a few forum posts but I didn't understand what they referred to. This is very helpful. I may try a third method using the Inet1.Execute command structure and also time that to see what's fastest.

You mentioned that this requires an additional control reference (I'm just banging around in this stuff for myself so there are no others PCs to worry about). What do you mean?
 
Upvote 0
Lots of words today eh? LOL 3rd way, maybe even 4th way after this post. :)

First, how to post code. See my signature block. It shows you how. Using that method of posting code is always best because it maintains the code formatting and makes for a quick and simple copy and paste into the editor.

Yesterday, I also read about issues of VBA code continuing before file download is complete. There was code somewhere to overcome that in vba, but unfortunately I didn't copy it over. I'll see if I can find it and incorporate it into the code I'm about to post for you.

Another thought for the shell and ftp method would be to use Chip Pearson's Shell and Wait. I've never used it, but it might be something to check out as an option. Shell And Wait

I still haven't been able to get the Inet1...blah blah blah stuff to work that requires the additional Microsoft Internet Controls, but I think the code I found today might be better than that anyway. It appears that most of that stuff needs to go through a userform, which to me seems kind of pointless when you don't need a userform, but the download of the files. The Microsoft Internet Controls are added by....

In the VB editor click Tools > References then scroll down to Microsoft Internet Controls, check the box and click ok. Although I read somewhere that it could be missing or reference the wrong DLL.

This was one site: Need help can't find Microsoft internet control in VBA list of references

Mine was referencing ieframe.dll before. I'm not familiar enough with these to know which one it should actually be.

Although, through a bunch of searching and playing today, I came up with something that I thought was quite useful. This only works with the http side of things. I tried 1000x times over to get it to work with ftp, but ftp urls produce undesired results. Although, it looks like for the end goal, it might not matter if you download via ftp or http.

Check this code out, although it might have the same problem of continuing before the download is complete. I'll look to see if I can find that code again to ensure it waits long enough. Also, this code, as I just found out, does crash if the file already exists locally, so code would have to be added to deal with a file that the same name exists locally.

Code:
Sub TestFileExistsandDownload()

'This code will test a web address to see if a file exists
'If the file exists, it will download the file.
'   It's a mixture of code retreived from 2 sites:
'   http://www.utteraccess.com/wiki/index.php?title=HttpRequest_Class
'   http://social.msdn.microsoft.com/Forums/en-US/exceldev/thread/62a95cb5-be76-4328-a9b9-ede9f981ba83/

Dim XmlHttpReq As Object
Dim WebURL As String
Dim WebFile As String
Dim SaveTo As String
Dim ObjStream As Object

WebURL = "http://www.sec.gov/Archives/edgar/data/1084869/000110465913008750/"
WebFile = "Financial_Report.xls"
SaveTo = "C:\rj\excel\"
   
Set XmlHttpReq = CreateObject("Microsoft.XMLHTTP")

XmlHttpReq.Open "GET", WebURL & WebFile, False
XmlHttpReq.Send

If XmlHttpReq.Status = 200 Then
'download if file exists
       Set ObjStream = CreateObject("ADODB.Stream")

        ObjStream.Open
        ObjStream.Type = 1
        ObjStream.Write XmlHttpReq.ResponseBody
        ObjStream.SaveToFile (SaveTo & WebFile)
        ObjStream.Close
Else
  MsgBox "Status: " & XmlHttpReq.Status & Chr(10) & _
         "Status Text: " & XmlHttpReq.StatusText
End If

Set XmlHttpReq = Nothing

End Sub

PS: Forget my job, your job is more fun. :) I'd be interested in the finished product, just because it interests me in how I may otherwise use the same techniques.
 
Upvote 0
Remembered the code I saw yesterday after searching around for it. No link, but it smacked me in the head. It's so simple.

Code:
Sub TestFileExistsandDownload()

'This code will test a web address to see if a file exists
'If the file exists, it will download the file.
'   It's a mixture of code retreived from 2 sites:
'   http://www.utteraccess.com/wiki/index.php?title=HttpRequest_Class
'   http://social.msdn.microsoft.com/Forums/en-US/exceldev/thread/62a95cb5-be76-4328-a9b9-ede9f981ba83/

Dim XmlHttpReq As Object
Dim WebURL As String
Dim WebFile As String
Dim SaveTo As String
Dim ObjStream As Object

WebURL = "http://www.sec.gov/Archives/edgar/data/1084869/000110465913008750/"
WebFile = "Financial_Report.xls"
SaveTo = "C:\rj\excel\"
   
Set XmlHttpReq = CreateObject("Microsoft.XMLHTTP")

XmlHttpReq.Open "GET", WebURL & WebFile, False
XmlHttpReq.Send

If XmlHttpReq.Status = 200 Then
       Set ObjStream = CreateObject("ADODB.Stream")
        ObjStream.Open
        ObjStream.Type = 1
        ObjStream.Write XmlHttpReq.ResponseBody
        ObjStream.SaveToFile (SaveTo & WebFile)
        ObjStream.Close
        
        'Since this method saves to cache first then copies
        'to the SaveTo directory, this loop checks the SaveTo
        'directory until the file exists.
        Do
            DoEvents
        Loop Until Dir(SaveTo & WebFile) <> ""
Else
  MsgBox "Status: " & XmlHttpReq.Status & Chr(10) & _
         "Status Text: " & XmlHttpReq.StatusText
End If

Set XmlHttpReq = Nothing

If Dir(SaveTo & WebFile) <> "" Then
    MsgBox "File Downloaded to: " & SaveTo & WebFile
End If

End Sub
 
Upvote 0
Sorry for so many posts, found something else with this method I personally didn't like. With the first "GET" it was downloading the whole file, so it would take a longer time than needed to just check if the file exists.

This one just checks the headers and takes a lot less time, then if it exists, it will download the whole file. I also added in a check for if the file already exists locally, it will delete (Kill) the file, although you might want to do something else, added just so I wouldn't have to keep deleting it manually while playing with the code.

Code:
Sub TestFileExistsandDownload()

'This code will test a web address to see if a file exists
'If the file exists, it will download the file.
'   It's a mixture of code retreived from 2 sites:
'   http://www.utteraccess.com/wiki/index.php?title=HttpRequest_Class
'   http://social.msdn.microsoft.com/Forums/en-US/exceldev/thread/62a95cb5-be76-4328-a9b9-ede9f981ba83/

Dim XmlHttpReq As Object
Dim WebURL As String
Dim WebFile As String
Dim SaveTo As String
Dim ObjStream As Object

WebURL = "http://www.sec.gov/Archives/edgar/data/1084869/000110465913008750/"
WebFile = "Financial_Report.xls"
SaveTo = "C:\rj\excel\"
   
'Delete file if it already exists
If Dir(SaveTo & WebFile) <> "" Then
    Kill SaveTo & WebFile
End If
   
Set XmlHttpReq = CreateObject("Microsoft.XMLHTTP")
'Check headers to see if file exists
XmlHttpReq.Open "HEAD", WebURL & WebFile, False
XmlHttpReq.Send

If XmlHttpReq.Status = 200 Then
    'download file if it exists
    XmlHttpReq.Open "GET", WebURL & WebFile, False
    XmlHttpReq.Send
       Set ObjStream = CreateObject("ADODB.Stream")
        ObjStream.Open
        ObjStream.Type = 1
        ObjStream.Write XmlHttpReq.ResponseBody
        ObjStream.SaveToFile (SaveTo & WebFile)
        ObjStream.Close
        
        'Since this method saves to cache first then copies
        'to the SaveTo directory like IE does, this loop
        'checks the SaveTo directory until the file exists.
        Do
            DoEvents
        Loop Until Dir(SaveTo & WebFile) <> ""
Else
  'what to do if file does not exist
  MsgBox "Status: " & XmlHttpReq.Status & Chr(10) & _
         "Status Text: " & XmlHttpReq.StatusText
End If

Set XmlHttpReq = Nothing

If Dir(SaveTo & WebFile) <> "" Then
    MsgBox "File Downloaded to: " & SaveTo & WebFile
End If

End Sub
 
Upvote 0
I'm back.
Last night after my lengthy post (your comment made me laugh, I find that if I explain to someone what I'm up to helps me sort out my priority list!) I tried to use bat file that launches passiveftp.exe. I had a logical error in the code that waits for the file to download and if the file never shows up I wait forever. Since my Ctrl+Break wasn't working I had to kill the process twice. The second time I did that the ftp software stopped accepting commands. I have no idea what happened but perhaps some flag gets set and I don't know how to clear the settings .. I've emailed the developer. But after a couple of hours of beating my head and rebooting twice I got it to work but by then I'd had enough.

So today I cleaned up the passiveftp method and put in a few bells and whistles and timed downloading the first 250 (of 16k) records. It took ~389 secs. Then I ported all that code over to John's method and made the appropriate adjustments and deletions and the same task took ~204 secs. That blew my mind. Hard to argue with a 47.5% time savings, even with such a limited sample size, so I'm about to kick off the full job in a few mins. My next evolution of this is to figure out how to pull data using XBRL. With that I understand I don't need to pull any files. I just search for the unique balance sheet line tags that I want and the data comes back. Not much useful info on pulling XBRL data around from what I can see so, as they say, this will be an entirely different challange for an entirely different day.

I read through the ShellAndWait idea and thought it to be a slick. Gotta remember that.

I got to a similar place as you re Inet. I don't know the concepts behind it and since there are several ways to do this now without using it I'm going to let it go for the moment.

I ran into the same issue with a file already existing, or at least the name of the file locally being the same as the one I'm saving, and also ended up with a Kill statement to sort it out. In my case the only reason that occurrs is if a firm has filed an amended financial statement and the way the SEC filings index (listed by corporate name) is built, amendments appear below the original filings so as I work down the spreadsheet I end up with the right file. Though I should put in a proper check to future proof this.

I need to think through your http code a bit. I have limited VB skills so some of the code introduces new concepts. Thanks for spending the time to work that code out. I'll work through it tomorrow.

Now to let the dogs out.
 
Upvote 0

Forum statistics

Threads
1,216,037
Messages
6,128,439
Members
449,453
Latest member
jayeshw

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top