Results 1 to 4 of 4

Thread: Loop through Folder for PDF Files - Open PDF in Word and extract text into Excel
Thanks Thanks: 0 Likes Likes: 0

  1. #1
    Board Regular
    Join Date
    Jan 2013
    Posts
    340
    Post Thanks / Like
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Default Loop through Folder for PDF Files - Open PDF in Word and extract text into Excel

    Hello,

    I am using the code below in Excel to open a PDF File in Word andextract the data as text only into Excel. The code works perfectly fine as itis, however, I want to loop through all pdf files within the folder instead ofhardcoding the filename in pathAndFileName variable.

    My goal is to be able to:

    1. Open each PDF file in Word
    2. Extract its data as text into the Excel “Test”
    3. Run two excel macros to parse the data that Ineed into another sheet within the same workbook set in the code
    4. Close Word without saving the file
    5. Clear both Word and PDF
    6. Clear the data in Excel “Test” worksheet
    7. Move PDF File to another folder
    8. Repeat steps 1-6 for the next PDF File


    Can anyone assist in tweaking this code to do what I need?

    Code:
    Sub pdf_To_Excel_Word()
    'Macro opens PDF Files as an editable Word Document
    'Copies the contents of the Word document
    'Pastes the Clipboard contents into Excel
    
    'Declare Variables
        Dim myWorksheet As Worksheet
        Dim wordApp AsWord.Application
        Dim myWshShell As wshShell
        Dim pathAndFileName As String
        Dim registryKey As String
        Dim wordVersion As String
    
    'Set Variables
        Set myWorksheet =ActiveWorkbook.Worksheets("Test")
        Set wordApp = NewWord.Application
        Set myWshShell = New wshShell
        pathAndFileName ="C:\Users\mh15601\Desktop\Projects\Audit Plan\2019\Testing\PDF Files\DONOT DELETE\A171065-ICG-TTS-Cash Management and Trade-Citibank NA-IndonesiaAudit Report.pdf"
        wordVersion = wordApp.Version
        registryKey ="HKCU\SOFTWARE\Microsoft\Office\" & wordVersion &"\Word\Options\"
    
    'Open and Copy PDF Files
        myWshShell.RegWriteregistryKey & "DisableConvertPdfWarning", 1,"REG_DWORD"
    
        wordApp.documents.Open _
           Filename:=pathAndFileName, _
            confirmconversions:=False
    
        myWshShell.RegWriteregistryKey & "DisableConvertPdfWarning", 0,"REG_DWORD"
    
    'Copy Data from Word
       wordApp.ActiveDocument.Content.Copy
    
    'Excel
        With myWorksheet
           .Range("A1").Select
            .PasteSpecialFormat:="Text"
        End With
    
    'Close Word
        wordApp.QuitSaveChanges:=wDoNotSaveChanges
    
    'Clear Word and PDF
        Set wordApp = Nothing
        Set myWshShell = Nothing
    
    
    End Sub 


    Please note: there has been an issue when I type directly in the forum.So I have been typing these messages in word and copy/pasting into the forum.So some of my sentences/words may be missing spaces. That is not how it’s beingtyped it’s just how it’s being pasted for some reason.

    Thank you


  2. #2
    Board Regular Rijnsent's Avatar
    Join Date
    Oct 2005
    Location
    Utrecht, Holland
    Posts
    1,080
    Post Thanks / Like
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)

    Default Re: Loop through Folder for PDF Files - Open PDF in Word and extract text into Excel

    Hi MHamid,
    first of all: cool that you have a directory named "DONOT DELETE" . Secondly: this code helps to loop though files: https://exceloffthegrid.com/vba-code...r-sub-folders/
    Integrated with your code that would be something like the code below (untested).
    Hope, that helps,
    Koen

    Code:
    Sub pdf_To_Excel_Word_2()
    'Macro opens PDF Files as an editable Word Document
    'Copies the contents of the Word document
    'Pastes the Clipboard contents into Excel
    
    'Declare Variables
        Dim myWorksheet As Worksheet
        Dim wordApp AsWord.Application
        Dim myWshShell As wshShell
        Dim pathAndFileName As String
        Dim registryKey As String
        Dim wordVersion As String
        Dim fileName As Variant
    
    'Set Variables
        Set myWorksheet = ActiveWorkbook.Worksheets("Test")
        Set wordApp = NewWord.Application
        Set myWshShell = New wshShell
        
        wordVersion = wordApp.Version
        registryKey = "HKCU\SOFTWARE\Microsoft\Office\" & wordVersion & "\Word\Options\"
    
    fileName = Dir("C:\Users\mh15601\Desktop\Projects\Audit Plan\2019\Testing\PDF Files\DONOT DELETE\*.pdf")
    
    While fileName <> ""
        
        'Insert the actions to be performed on each file
        'This example will print the file name to the immediate window
        Debug.Print fileName
    
    
        'Open and Copy PDF Files
            myWshShell.RegWriteregistryKey & "DisableConvertPdfWarning", 1,"REG_DWORD"
            wordApp.documents.Open fileName:=pathAndFileName, confirmconversions:=False
            myWshShell.RegWriteregistryKey & "DisableConvertPdfWarning", 0,"REG_DWORD"
        'Copy Data from Word
           wordApp.ActiveDocument.Content.Copy
        
        'Excel
            LastRow = myWorksheet.Cells(Cells.Rows.Count, 1).End(xlUp).Row
            With myWorksheet
               .Range("A" & LastRow).Select
                .PasteSpecialFormat:="Text"
            End With
        
        'Close Word
            wordApp.QuitSaveChanges:=wDoNotSaveChanges
    
        'Set the fileName to the next file
        fileName = Dir
    Wend
    
    'Clear Word and PDF
        Set wordApp = Nothing
        Set myWshShell = Nothing
    
    
    End Sub
    You can't post attachments here, but you can help me helping you by posting a screen shot directly in your post with any of those tools.
    Otherwise use dropbox/google drive/etc to get your file accross (not preferred). For code, put it inside these tags: [ CODE][/CODE]. Do check the forum rules.
    Finally, please show that you made an effort to solve your problem: Yes, I like to help, but am not going to do your job.

  3. #3
    Board Regular
    Join Date
    Jan 2013
    Posts
    340
    Post Thanks / Like
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Default Re: Loop through Folder for PDF Files - Open PDF in Word and extract text into Excel

    Hello,

    I am getting an error message stating "Run-time error '5174': Sorry, we couldn't find your file. Was it moved, renamed, or deleted?". The file exists and it is pointing to the correct location.
    When I click Debug the highlighted portion of the code is below

    Code:
    wordApp.Documents.Open fileName:=pathAndFileName, ConfirmConversions:=False
    Full Code:
    Code:
    Sub pdf_To_Excel_Word_2()
    'Macro opens PDF Files as an editable Word Document
    'Copies the contents of the Word document
    'Pastes the Clipboard contents into Excel
    
    'Declare Variables
        Dim myWorksheet As Worksheet
        Dim wordApp As Object
        Dim myWshShell As wshShell
        Dim pathAndFileName As String
        Dim registryKey As String
        Dim wordVersion As String
        Dim fileName As Variant
    'Set Variables
        Set myWorksheet = ActiveWorkbook.Worksheets("Test")
        Set wordApp = New Word.Application
        Set myWshShell = New wshShell
        
        wordVersion = wordApp.Version
        registryKey = "HKCU\SOFTWARE\Microsoft\Office\" & wordVersion & "\Word\Options\"
    pathAndFileName = Dir("C:\Users\mh15601\Desktop\Projects\Audit Plan\2019\Testing\PDF Files\DO NOT DELETE\*.pdf")
    While pathAndFileName <> ""
        
        'Insert the actions to be performed on each file
        'This example will print the file name to the immediate window
        'Debug.Print fileName
    
        'Open and Copy PDF Files
            myWshShell.RegWrite registryKey & "DisableConvertPdfWarning", 1, "REG_DWORD"
            wordApp.Documents.Open fileName:=pathAndFileName, ConfirmConversions:=False
            myWshShell.RegWrite registryKey & "DisableConvertPdfWarning", 0, "REG_DWORD"
        
        'Copy Data from Word
           wordApp.ActiveDocument.Content.Copy
        
        'Excel
            LastRow = myWorksheet.Cells(Cells.Rows.Count, 1).End(xlUp).Row
            With myWorksheet
                .Range("A" & LastRow).Select
                .PasteSpecial Format:="Text"
            End With
        
        'Close Word
            wordApp.Quit SaveChanges:=wDoNotSaveChanges
        'Set the pathAndFileName to the next file
        pathAndFileName = Dir
    Wend
    'Clear Word and PDF
        Set wordApp = Nothing
        Set myWshShell = Nothing
    
    End Sub
    Thank you

  4. #4
    Board Regular Rijnsent's Avatar
    Join Date
    Oct 2005
    Location
    Utrecht, Holland
    Posts
    1,080
    Post Thanks / Like
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)

    Default Re: Loop through Folder for PDF Files - Open PDF in Word and extract text into Excel

    Hi MHamid,
    searching for your error code online makes me think that there is probably a space in your filename. See this post: https://answers.microsoft.com/en-us/...0-455114cc904e
    Cheers,
    Koen
    You can't post attachments here, but you can help me helping you by posting a screen shot directly in your post with any of those tools.
    Otherwise use dropbox/google drive/etc to get your file accross (not preferred). For code, put it inside these tags: [ CODE][/CODE]. Do check the forum rules.
    Finally, please show that you made an effort to solve your problem: Yes, I like to help, but am not going to do your job.

Some videos you may like

User Tag List

Tags for this Thread

Like this thread? Share it with others

Like this thread? Share it with others

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •