vba - excel 2010 - use macro to download pdf files, rename the files and save them in a folder

abeltio

New Member
Joined
Nov 21, 2011
Messages
4
Good times everyone,

Using: windows 7 professional, excel 2010 and ie 8.0

Please forgive my almost close to "absolute zero" knowledge of vba - an absolute beginner, in the immortal words of David Bowie -

Googling, cutting, pasting and with trial and error and error and error... managed to prepare some code for what i needed and so far been successful thanks to the work of many member and MVPs in this forum.

Now i am truly stuck, please have mercy: don't send me to interpret code that does not contain the nomenclature listed below! :eek:)

DESCRIPTION
01. Given an intranet URL, lets call it: doc_list
This is a URL (company intranet, so it is useless to post it - there is no access from outside) that contains a list of documents with hyperlinks.

02. How does doc_list look?
The URL doc_list has a list of documents, and the list is organized as follows:
10-2R1 Title of document 10
0101-R3 Title of document 0101
...
...
1820 Title of document 1820

There are about 2500 documents.

03. How does doc_list work?
The numbers on the left have hyperlinks to a pdf file, or sometimes HTML file that contains the document itself.

When clicking on the link provided the target pdf or HTML file opens.

Check the hyperlinks in the example above: the pdf or HTML filenames "look like" the nomenclature shown in doc_list but are not identical, the hyphen is gone and the "R" converted to an "r", sometimes! There is no consistency.

04. What do i need to do?
Create a vba macro in excel 2010 that will:

04.a - open the doc_list URL

04.b - FOR EACH DOCUMENT IN doc_list:

04.b.1 Follow the link provided**

04.b.2 - create a folder named T-pdf in a path selected by the user (with a browse function similar to "Save As")
04.b.3 - save each target file of the hyperlinks in doc_list in the folder called T-pdf we just created

04.b.4 - print all the HTML files saved in folder T-pdf to pdf

04.b.5 - delete all the HTML files from folder T-pdf

04.b.6 - capture the filename of each pdf file into variable doc_name.

04.b.7 - calculate a new filename, save that name in variable newdoc_name.
The new name is consistent with the following nomenclature: T00xy-mRn.pdf or T0xyz-mRn.pdf or Twxyz-mRn.pdf
Where: m, R, and n could be present or not, and the number of leading zeroes is used to allow proper numerical sorting of the files.

Already have a procedure to calculate the correct filename newdoc_name once the current filename is captured by variable doc_name, let's call that procedure: run_index

04.b.8 - rename (or save as) each pdf file with the name calculated by run_index in 04.g stored in variable newdoc_name

04.c - end when all the documents listed in doc_list were processed



** the HTML source of doc_list contains all the links, perhaps it can be used.

Hope you can help me with this, please treat me as a complete illiterate - i don't mind... in fact: i deserve it!

Muchas gracias
a.
 

Excel Facts

Excel Joke
Why can't spreadsheets drive cars? They crash too often!
I got you thru 4.b.3 -- I think to generate your pdfs you'll need some other third party tool. Also, I didn't follow your nomenclature rules for the new filenames, sry.

Code:
' This is an API... since you don't know so much about programming, it seems, then just take my word for it that you need this.
Private Declare Function URLDownloadToFile Lib "urlmon" Alias _
  "URLDownloadToFileA" (ByVal pCaller As Long, ByVal szURL As String, ByVal _
    szFileName As String, ByVal dwReserved As Long, ByVal lpfnCB As Long) As Long
 
Sub PullFilesDown()
 
    'Setting up variables.  "Dim" is short for Dimension, and basically we're telling the program to reserve some space in its memory for information that we want to track
    Dim mainBook As Workbook
    Dim intranetLink As String
    Dim saveDialog As FileDialog
    Dim savePath As String
    Dim filename As String
    Dim fileCtr As Integer
    fileCtr = 0
 
    'This line uses the "saveDialog" variable and tells Excel to open up a dialog box (and in this macro we're going to that box 'saveDialog') for the user so that they can choose a destination folder for all the files we want to save.  It's going to keep the name of that folder in the variable "savePath"
    Set saveDialog = Application.FileDialog(msoFileDialogFolderPicker)
    With saveDialog
        .Title = "Select a Folder"              'sticks a title on the dialog so the user kind of knows what they're supposed to be doing
        .AllowMultiSelect = False               'prevents the user from selecting more than one item out the dialog.
        .InitialFileName = strPath              '
        If .Show <> -1 Then GoTo FolderBombed   'if the user does something funky or cancels, abort the rest of the macro.
        savePath = .SelectedItems(1)            'get the file path to the selected folder
    End With
    savePath = savePath + "\T-pdf"              'update the savePath string so that it points to a subdirectory 'T-pdf' in the selected directory (which doesn't exist)
    MkDir (savePath)                            'create T-pdf
 
    intranetLink = "[URL]http://intranet/[/URL]"           'YOUR doc_list url goes here!!!'
    Set mainBook = Workbooks.Open(intranetLink) 'open that page IN excel.  This is a little ugly, but it's going to let us use the Hyperlinks collection and avoid all kinds of ugly string manipulation & expression based searches to hunt out the URLS in doc_list
    mainBook.Activate
 
    For Each link In mainBook.ActiveSheet.Hyperlinks                'cycle through the following code for every hyperlink in doc_list.
        On Error Resume Next                                        'if there's a problem, just skip it and try to do the next thing.  This would be considered pretty sloppy programming in most circles, you should handle errors (particularly in file operations).  Sadly, I'm not going to go into all of that for you.  Consider doing your own research, tho.
        filename = savePath + "\" + link.TextToDisplay + ".html"    'some string manipulation to get the path and a filename together.
        URLDownloadToFile 0, link.Address, filename, 0, 0           'download the file on the other end of the link!  again, no error handling. there's a problem, there's no way for you to know about it right now.
        On Error GoTo 0                                             'set error handling to it's usual state.  (if there's a problem, just bomb the macro and stop executing)
    Next link
    End
 
FolderBombed:
    MsgBox ("Invalid folder.  Process aborted")
End Sub
 
Upvote 0

Forum statistics

Threads
1,214,944
Messages
6,122,392
Members
449,081
Latest member
JAMES KECULAH

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top