Get text from PDF using Excel VBA

CodyMonster

Board Regular
Joined
Sep 28, 2009
Messages
147
I have a PDF file with various fields in the document.
Addresses, cofirmation numbers, amounts..etc.
I'm trying to open a PDF, grab this text (which changes) and put the information into excel.

I'm able to open the PDF in Adobe Acrobat and edit the text and images.
When going into Edit Mode, I can select the text box where the text is avalaible.
However, the text boxes are not named.
Each outlined text in the picture below is a textbox.

Here I would like to grab the forth text box from left to right.
"XYZ, Inc." Since this text box is not named is there any way to tell it to grab this specific text box.
That box will not change but the data inside of it will.

I'm kinda figuring out how to open PDFs in excel using VBA with the help of this page
Karl Heinz Kremer\'s Ramblings

Coverting to a different format isn't working since its dropping information.
For example, the confirm# doesn't come through when exported to a Word Doc or Rich Text.

Any help would be massively apprecitated. :confused::)


 

Some videos you may like

Excel Facts

Can a formula spear through sheets?
Use =SUM(January:December!E7) to sum E7 on all of the sheets from January through December

Kenneth Hobson

Well-known Member
Joined
Feb 6, 2007
Messages
3,092
See if this helps.
Code:
Public Sub ListPDF_Fields()
             
    Dim AcroExchAVDoc As CAcroAVDoc
    Dim AcroExchApp As CAcroApp
    Dim AFORMAUT As AFORMAUTLib.AFormApp
    Dim FormField As AFORMAUTLib.Field
    Dim FormFields As AFORMAUTLib.Fields
    Dim bOK As Boolean
    Dim sFields As String
    Dim sTypes As String
    Dim sFieldName As String
                             
    ' For this procedure to work, computer must have a full version
    ' of Adobe Acrobat installed. Also, a reference to the following
    ' Type Libraries must be made:
    '     AFormAut 1.0
    '     Adobe Acrobat 7.0 (or newer)
    
    On Error GoTo ErrorHandler
    
    Set AcroExchApp = CreateObject("AcroExch.App")
    Set AcroExchAVDoc = CreateObject("AcroExch.AVDoc")
    bOK = AcroExchAVDoc.Open(ActiveWorkbook.Path & "\" & PDF_FILE, "")
    AcroExchAVDoc.BringToFront
    AcroExchApp.Hide
    
    If (bOK) Then
        Set AFORMAUT = CreateObject("AFormAut.App")
        Set FormFields = AFORMAUT.Fields
        For Each FormField In FormFields
            With FormField
                sFieldName = .Name
                If .IsTerminal Then
                    If sFields = "" Then
                        sFields = .Name
                        sTypes = .Type
                    Else
                        sFields = sFields + "," + .Name
                        sTypes = sTypes + "," + .Type
                    End If
                End If
            End With
        Next FormField
        AcroExchAVDoc.Close True
    End If
    Debug.Print sFields
    Debug.Print sTypes
   
    Set AcroExchAVDoc = Nothing
    Set AcroExchApp = Nothing
    Set AFORMAUT = Nothing
    Set Field = Nothing
    Exit Sub
         
ErrorHandler:
    MsgBox "FieldList Error: " + str(Err.Number) + " " + Err.Description + " " + Err.Source
    
End Sub
 

CodyMonster

Board Regular
Joined
Sep 28, 2009
Messages
147
Thanks Kenneth!
Started to use it and it seems to skip through the code.
At the line:
For Each FormField In FormFields it goes to AcroExchAVDoc.Close True
When I mouseover FormFields I get FormFields = Nothing
So it isn't reconizing the data in the field.

Thanks for the help!!
 

CodyMonster

Board Regular
Joined
Sep 28, 2009
Messages
147
Oh, and in case you ask, yes I the libraries AFormAut 1.0 and Adobe Acrobat 10.0 referenced
Also, I know that Adobe is being referenced and working, as the PDF shows when I get to AcroExchAVDoc.BringToFront.
 

Kenneth Hobson

Well-known Member
Joined
Feb 6, 2007
Messages
3,092
Can you post a link to a simple pdf file to a free shared site?
 

Kenneth Hobson

Well-known Member
Joined
Feb 6, 2007
Messages
3,092
I don't see any textboxes in that. Of course no fields exist if the file was flattened.

I will open it in acrobat tomorrow and check for fields rather than using a 3rd party pdf program.
 

CodyMonster

Board Regular
Joined
Sep 28, 2009
Messages
147
I don't see any textboxes in that. Of course no fields exist if the file was flattened.

I will open it in acrobat tomorrow and check for fields rather than using a 3rd party pdf program.

Thanks again.
When I open it in Adobe Standard and go to Tools and Edit Text I'm able to get to the boxes with text in it.
If this was flattened, wouldn't it be like a picture? and you wouldn't be able to edit the text at all?

Thank again for your help Kenneth. I'm not familiar with doing anything with PDF of this nature..
 

Kenneth Hobson

Well-known Member
Joined
Feb 6, 2007
Messages
3,092
We can easily see if this is a form and has fields. After jso was Set.
Code:
Debug.Print jso.numfields
The value in your file is 0.

I think that your best bet is to use something like this. In the example, you can see that I showed the value for the 2nd index value in the parsed (split) array. 1 is 2nd since a string array is always 0 based.

Of course change the dropbox path to my file to yours.

Code:
'http://www.eileenslounge.com/viewtopic.php?f=30&t=5907
Sub Demo()
  Dim s() As String
  Dim strPDF As String, strTmp As String, i As Integer
  ' The next ten lines and the last line in this sub can help if
  ' you get "ActiveX component can't create object" errors even
  ' though a Reference to Acrobat is set in Tools|References.
  Dim bTask As Boolean
    bTask = True
  'If Tasks.Exists(Name:="Adobe Acrobat Professional") = False Then
    bTask = False
    Dim AdobePath As String, WshShell As Object
    Set WshShell = CreateObject("Wscript.shell")
    AdobePath = WshShell.RegRead("HKEY_CLASSES_ROOT\acrobat\shell\open\command\")
    AdobePath = Trim(Left(AdobePath, InStr(AdobePath, "/") - 1))
    Shell AdobePath, vbHide
  'End If
  'Replace FilePath & Filename with the correct FilePath & Filename for the pdf file to be read.
  
  strPDF = ReadAcrobatDocument("C:\Users\130103\Dropbox\MOAB.v3.pdf") '***Use your path.
  
  'ActiveDocument.Range.InsertAfter strPDF
  Debug.Print strPDF
  s() = Split(strPDF, vbLf)
  MsgBox s(1), vbInformation, "2nd vblf delimited text"
  'If bTask = False Then Tasks.Item("Adobe Acrobat Professional").Close
End Sub


Public Function ReadAcrobatDocument(strFileName As String) As String
  'Note: A Reference to the Adobe Library must be set in Tools|References!
  Dim AcroApp As CAcroApp, AcroAVDoc As CAcroAVDoc, AcroPDDoc As CAcroPDDoc
  Dim AcroHiliteList As CAcroHiliteList, AcroTextSelect As CAcroPDTextSelect
  Dim PageNumber, PageContent, Content, i, j
  Set AcroApp = CreateObject("AcroExch.App")
  Set AcroAVDoc = CreateObject("AcroExch.AVDoc")
  If AcroAVDoc.Open(strFileName, vbNull) <> True Then Exit Function
  ' The following While-Wend loop shouldn't be necessary but timing issues may occur.
  While AcroAVDoc Is Nothing
    Set AcroAVDoc = AcroApp.GetActiveDoc
  Wend
  Set AcroPDDoc = AcroAVDoc.GetPDDoc
  For i = 0 To AcroPDDoc.GetNumPages - 1
    Set PageNumber = AcroPDDoc.AcquirePage(i)
    Set PageContent = CreateObject("AcroExch.HiliteList")
    If PageContent.Add(0, 9000) <> True Then Exit Function
    Set AcroTextSelect = PageNumber.CreatePageHilite(PageContent)
    ' The next line is needed to avoid errors with protected PDFs that can't be read
    On Error Resume Next
    For j = 0 To AcroTextSelect.GetNumText - 1
      Content = Content & AcroTextSelect.GetText(j)
    Next j
  Next i
  ReadAcrobatDocument = Content
  AcroAVDoc.Close True
  AcroApp.Exit
  Set AcroAVDoc = Nothing: Set AcroApp = Nothing
End Function
 

CodyMonster

Board Regular
Joined
Sep 28, 2009
Messages
147
Okay.. looks like its working. Will need to play with it for a bit.
But, I understand what were doing here (sorta), but thanks for the info.
Much appreciated!!!!
 

Watch MrExcel Video

Forum statistics

Threads
1,099,567
Messages
5,469,453
Members
406,652
Latest member
LJA

This Week's Hot Topics

Top