Import PDF to Excel - Can anyone edit this VBA code?

navic

Active Member
Joined
Jun 14, 2015
Messages
346
Office Version
  1. 2013
Platform
  1. Windows
Can anyone edit this VBA code to work properly.

I use Adobe Acrobat Reader DC installed in the path
C:\Program Files (x86)\Adobe\Acrobat Reader DC\Reader\

The VBA code is below

in form mudule named 'frm_pdfimp'
Code:
Option Explicit

Private Sub cmd_imp_Click()

'check import option

If opt_xl.Value = False And opt_txt.Value = False Then
    MsgBox "Please select one of the import mode"
    Exit Sub
End If

Dim OS_FSO As Object

Set OS_FSO = CreateObject("Scripting.filesystemobject")

Dim PDF_Path As String, Txt_Fol As String

PDF_Path = txt_pdf.Text

'check the PDF file exists

If OS_FSO.fileexists(PDF_Path) = False Then
    MsgBox "PDF file not found"
    Set OS_FSO = Nothing
    Exit Sub
End If

If opt_txt.Value = True Then
   
    Txt_Fol = txt_txt.Text
   
    'check the folder for text file if import PDF data into text file
   
    If OS_FSO.folderexists(Txt_Fol) = False Then
        MsgBox "Folder '" & Txt_Fol & "' not exist please select valid folder"
        Set OS_FSO = Nothing
        Exit Sub
    End If
   
   
    'import into text files
    Call Imp_Into_Txt(PDF_Path, Txt_Fol, chk_txt.Value)
   
End If

If opt_xl.Value = True Then
   
    'import into text files
    Call Imp_Into_XL(PDF_Path, chk_xl.Value)
   
End If
   
   
End Sub

Private Sub cmd_pdf_Click()

Dim Dlg_File As FileDialog
Set Dlg_File = Application.FileDialog(msoFileDialogFilePicker)

txt_pdf.Text = ""

With Dlg_File
    .Filters.Add "PDF Files", "*.pdf"
    If .Show = -1 Then
        txt_pdf.Text = .SelectedItems(1)
    End If
End With

Set Dlg_File = Nothing

End Sub

Private Sub cmd_txt_Click()

'get the folder for save text file(s)

Dim Dlg_Fol As FileDialog
Set Dlg_Fol = Application.FileDialog(msoFileDialogFolderPicker)
txt_txt.Text = ""

If Dlg_Fol.Show = -1 Then
    txt_txt.Text = Dlg_Fol.SelectedItems(1)
End If

Set Dlg_Fol = Nothing

End Sub

Private Sub opt_txt_Click()

Call Con_Txt(True)

End Sub

Private Sub opt_xl_Click()

Call Con_Txt(False)

End Sub

Private Sub Con_Txt(Ena As Boolean)

'set the intial value

txt_txt.Enabled = Ena
cmd_txt.Enabled = Ena
chk_txt.Enabled = Ena
chk_xl.Enabled = Not Ena

End Sub

When I run VBA, this line of code is highlighted. ("Can't find project or library").
Line of code
Code:
Dim AC_PD As Acrobat.AcroPDDoc              'access pdf file !!! error is here !!!
in standard module named 'pdf2text'
Code:
Option Explicit

Sub Main_Import()
frm_pdfimp.Show
End Sub

Sub Imp_Into_XL(PDF_File As String, Each_Sheet As Boolean)

'This procedure get the PDF data into excel by following way

'1.Open PDF file
'2.Looping through pages
'3.get the each PDF page data into individual _
  sheets or single sheet as defined in Each_Sheet Parameter


Dim AC_PD As Acrobat.AcroPDDoc              'access pdf file !!! error!!!
Dim AC_Hi As Acrobat.AcroHiliteList         'set selection word count
Dim AC_PG As Acrobat.AcroPDPage             'get the particular page
Dim AC_PGTxt As Acrobat.AcroPDTextSelect    'get the text of selection area

Dim WS_PDF As Worksheet
Dim RW_Ct As Long                           'row count
Dim Col_Num As Integer                      'column count
Dim Li_Row As Long                          'Maximum rows limit for one column
Dim Yes_Fir As Boolean                      'to identify beginning of page

Li_Row = Rows.Count

Dim Ct_Page As Long                         'count pages in pdf file
Dim i As Long, j As Long, k As Long         'looping variables
Dim T_Str As String

Dim Hld_Txt As Variant                      'get PDF total text into array

RW_Ct = 0                                   'set the intial value
Col_Num = 1                                 'set the intial value

Application.ScreenUpdating = False

Set AC_PD = New Acrobat.AcroPDDoc
Set AC_Hi = New Acrobat.AcroHiliteList

'set maximum selection area of PDF page

AC_Hi.Add 0, 32767

With AC_PD
   
    'open PDF file
   
    .Open PDF_File
   
    'get the number of pages of PDF file
   
    Ct_Page = .GetNumPages
   
    'if get pages is failed exit sub
   
    If Ct_Page = -1 Then
        MsgBox "Pages Cannot determine in PDF file '" & PDF_File & "'"
        .Close
        GoTo h_end
    End If
   
    'add sheet only one time if Data retrive in one sheet
   
    If Each_Sheet = False Then
        Set WS_PDF = Worksheets.Add(, Worksheets(Sheets.Count))
        WS_PDF.Name = "PDF2Text"
    End If
   
    'looping through sheets
   
    For i = 1 To Ct_Page
       
        T_Str = ""
        'get the page
        Set AC_PG = .AcquirePage(i - 1)
       
        'get the full page selection
        Set AC_PGTxt = AC_PG.CreateWordHilite(AC_Hi)
       
        'if text selected successfully get the all the text into T_Str string
       
        If Not AC_PGTxt Is Nothing Then
       
            With AC_PGTxt
               
                For j = 0 To .GetNumText - 1
                    T_Str = T_Str & .GetText(j)
                Next j
               
            End With
           
        End If
       
       
        If Each_Sheet = True Then
           
            'add each sheet for each page
           
            Set WS_PDF = Worksheets.Add(, Worksheets(Sheets.Count))
           
        End If
       
        'transfer PDF data into sheet
       
        With WS_PDF
           
            If Each_Sheet = True Then
           
                .Name = "Page-" & i
               
                'get the PDF data into each sheet for each PDF page
               
                'if text accessed successfully then split T_Str by VbCrLf
                'and get into array Hld_Txt and looping through array and fill sheet with PDF data
               
                If T_Str <> "" Then
                    Hld_Txt = Split(T_Str, vbCrLf)
                   
                    For k = 0 To UBound(Hld_Txt)
                        T_Str = CStr(Hld_Txt(k))
                        If Left(T_Str, 1) = "=" Then T_Str = "'" & T_Str
                        .Cells(k + 1, 1).Value = T_Str
                    Next k
                        Else
                           
                            'information if text not retrive from PDF page
                       
                            .Cells(1, 1).Value = "No text found in page " & i
                End If
               
                    Else
                       
                        'get the pdf data into single sheet
                       
                        If T_Str <> "" Then
                            Hld_Txt = Split(T_Str, vbCrLf)
                           
                            Yes_Fir = True
                           
                            For k = 0 To UBound(Hld_Txt)
                           
                                RW_Ct = RW_Ct + 1
                               
                                'check begining of page if yes enter PDF page number for any idenfication
                               
                                If Yes_Fir Then
                                    RW_Ct = RW_Ct + 1
                                    .Cells(RW_Ct, Col_Num).Value = "Text In Page - " & i
                                    RW_Ct = RW_Ct + 2
                                    Yes_Fir = False
                                End If
                               
                                'check for maximum rows if exceeds start from next column
                               
                                If RW_Ct > Li_Row Then
                                    RW_Ct = 1
                                    Col_Num = Col_Num + 1
                                End If
                               
                                T_Str = CStr(Hld_Txt(k))
                                If Left(T_Str, 1) = "=" Then T_Str = "'" & T_Str
                                .Cells(RW_Ct, Col_Num).Value = T_Str
                               
                            Next k
                           
                                Else
                                   
                                    RW_Ct = RW_Ct + 1
                                    .Cells(RW_Ct, Col_Num).Value = "No text found in page " & i
                                    RW_Ct = RW_Ct + 1
                           
                        End If
                           
            End If
                   
        End With
    Next i
   
    .Close
   
End With
           
Application.ScreenUpdating = True

MsgBox "Imported"
           
h_end:
   
Set WS_PDF = Nothing
Set AC_PGTxt = Nothing
Set AC_PG = Nothing
Set AC_Hi = Nothing
Set AC_PD = Nothing
           
End Sub


Sub Imp_Into_Txt(T_PDF_File As String, Fol_Path As String, Each_Page As Boolean)

'same as above procedure instead of sheets use text files

Dim AC_PD As Acrobat.AcroPDDoc
Dim AC_Hi As Acrobat.AcroHiliteList
Dim AC_PG As Acrobat.AcroPDPage
Dim AC_PGTxt As Acrobat.AcroPDTextSelect

Dim OS_FSO As Object
Dim OS_TxtFile As Object

Set OS_FSO = CreateObject("Scripting.filesystemobject")

Dim Ct_Page As Long
Dim i As Long, j As Long, k As Long
Dim T_Str As String

Dim Hld_Txt As Variant

Set AC_PD = New Acrobat.AcroPDDoc
Set AC_Hi = New Acrobat.AcroHiliteList
AC_Hi.Add 0, 32767

With AC_PD
   
    .Open T_PDF_File
    Ct_Page = .GetNumPages
   
    If Ct_Page = -1 Then
        MsgBox "Pages Cannot determine in PDF file '" & T_PDF_File & "'"
        .Close
        GoTo h_end
    End If
   
    If Each_Page = False Then
        Set OS_TxtFile = OS_FSO.createtextfile(Fol_Path & "\pdf2text.txt")
    End If
   
    For i = 1 To Ct_Page
       
        T_Str = ""
        Set AC_PG = .AcquirePage(i - 1)
       
        Set AC_PGTxt = AC_PG.CreateWordHilite(AC_Hi)
       
        If Not AC_PGTxt Is Nothing Then
       
            With AC_PGTxt
               
                For j = 0 To .GetNumText - 1
                    T_Str = T_Str & .GetText(j)
                Next j
               
            End With
           
        End If
       
        If T_Str = "" Then T_Str = "No text found in page " & i
       
        If Each_Page = True Then
            Set OS_TxtFile = OS_FSO.createtextfile(Fol_Path & "\Page-" & i & ".txt")
            OS_TxtFile.write T_Str
            OS_TxtFile.Close
            Set OS_TxtFile = Nothing
                Else
                   
                    T_Str = vbCrLf & vbCrLf & "Text In Page - " & i & vbCrLf & vbCrLf & T_Str
                    OS_TxtFile.write T_Str
        End If
    Next i
   
    If Each_Page = False Then OS_TxtFile.Close
    .Close
   
End With

MsgBox "Imported"

h_end:

Set OS_TxtFile = Nothing
Set OS_FSO = Nothing
Set AC_PGTxt = Nothing
Set AC_PG = Nothing
Set AC_Hi = Nothing
Set AC_PD = Nothing

End Sub
The source vba code is at this link (VBA Express : Excel - get the data from PDF file into Excel sheet(s) or text file(s))
Link to download example file (http://www.vbaexpress.com/kb/default.php?action=13&kb_id=1101)
 

Excel Facts

Difference between two dates
Secret function! Use =DATEDIF(A2,B2,"Y")&" years"&=DATEDIF(A2,B2,"YM")&" months"&=DATEDIF(A2,B2,"MD")&" days"
When I run VBA, this line of code is highlighted. ("Can't find project or library").
Line of code
Code:
Dim AC_PD As Acrobat.AcroPDDoc              'access pdf file !!! error is here !!!
You need Adobe Acrobat Pro, not the free Reader, to use the Acrobat library.
 
Upvote 0
Solution
Uhhhh, bad luck.
So no solution for free acrobat reader?
@John_w , In any case, thank you for reporting.
 
Upvote 0
I know this, but there are multiple actions connected one after another.
After importing from PDF, Excel further calculates based on the imported data.
Thanks anyway for the tip.
 
Upvote 0

Forum statistics

Threads
1,214,819
Messages
6,121,746
Members
449,050
Latest member
excelknuckles

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top