Open PDF from listed in column A, search for text in PDF that is in adjacent cell in Column B, and return a TRUE/FALSE Value

oreo haven

Board Regular
Joined
May 15, 2008
Messages
65
All,
I have a problem where several invoices may have the wrong information written into them. As long as the number in Column B is inside the PDF listed in Column A, I know that it is ok.
So I would like to search the file listed in column A and see if the strin listed in the corresponding row in Column B is found in it. If so, I just need a "True" to be written to the corresponding Cell in Column C.
I have 3800 invoices to check, right now. Depending on the results, I could end up with 30k more files to check.
I do have Acrobat Pro and have enabled the libraries.
Has anyone here run into this before? The table below is what I would prefer to have in the end.

FULL_PATH TO PDF
STRING_TO_FINDIs_It_There
c:\users\jsd\documents\invoices\18996_58720_INV767056.pdf767056
TRUE​
c:\users\jsd\documents\invoices\18996_58728_INV767057.pdf767057
TRUE​
c:\users\jsd\documents\invoices\18996_58736_INV767058.pdf767058
TRUE​
c:\users\jsd\documents\invoices\18996_58744_INV767059.pdf767059
FALSE​
c:\users\jsd\documents\invoices\18996_58752_INV767060.pdf767060
TRUE​
c:\users\jsd\documents\invoices\18996_58760_INV767065.pdf767065
TRUE​
c:\users\jsd\documents\invoices\18996_58768_INV767066.pdf767066
FALSE​
c:\users\jsd\documents\invoices\18996_58776_INV767067.pdf767067
FALSE​
c:\users\jsd\documents\invoices\18996_58784_INV767069.pdf767069
TRUE​
 
did you use this?
report.png
 
Upvote 0

Excel Facts

What is =ROMAN(40) in Excel?
The Roman numeral for 40 is XL. Bill "MrExcel" Jelen's 40th book was called MrExcel XL.
Bit of a mess as both threads had responses but I have merged them and moved to excel questions as requested.
 
Upvote 0
All,
I have a problem where several invoices may have the wrong information written into them. As long as the number in Column B is inside the PDF listed in Column A, I know that it is ok.
So I would like to search the file listed in column A and see if the strin listed in the corresponding row in Column B is found in it. If so, I just need a "True" to be written to the corresponding Cell in Column C.
I have 3800 invoices to check, right now. Depending on the results, I could end up with 30k more files to check.
I do have Acrobat Pro and have enabled the libraries.
Has anyone here run into this before? The table below is what I would prefer to have in the end.

FULL_PATH TO PDF
STRING_TO_FINDIs_It_There
c:\users\jsd\documents\invoices\18996_58720_INV767056.pdf767056
TRUE​
c:\users\jsd\documents\invoices\18996_58728_INV767057.pdf767057
TRUE​
c:\users\jsd\documents\invoices\18996_58736_INV767058.pdf767058
TRUE​
c:\users\jsd\documents\invoices\18996_58744_INV767059.pdf767059
FALSE​
c:\users\jsd\documents\invoices\18996_58752_INV767060.pdf767060
TRUE​
c:\users\jsd\documents\invoices\18996_58760_INV767065.pdf767065
TRUE​
c:\users\jsd\documents\invoices\18996_58768_INV767066.pdf767066
FALSE​
c:\users\jsd\documents\invoices\18996_58776_INV767067.pdf767067
FALSE​
c:\users\jsd\documents\invoices\18996_58784_INV767069.pdf767069
TRUE​
Note that sandy666's Power Query code searches the file names, not the file contents.

This macro uses the Acrobat API to search the file contents.

VBA Code:
Public Sub Search_All_PDFs()

    Dim lastRow As Long, r As Long
   
    With ActiveWorkbook.ActiveSheet
        lastRow = .Cells(.Rows.Count, "A").End(xlUp).Row
        For r = 2 To lastRow
            .Cells(r, "C").ClearContents
            .Cells(r, "C").Value = Search_PDF(.Cells(r, "A").Value, .Cells(r, "B").Value)
        Next
    End With
   
End Sub


Private Function Search_PDF(PDFfullName As String, findText As String) As Boolean
    
    Dim AcroPDDoc      As Object
    Dim AcroHiliteList As Object
    Dim AcroPDPage     As Object
    Dim AcroTextSelect As Object
    Dim p As Long, i As Long
    
    Search_PDF = False
    
    Set AcroPDDoc = CreateObject("AcroExch.PDDoc")
    
    If Not AcroPDDoc.Open(PDFfullName) Then
        MsgBox "Unable to open " & PDFfullName
        Exit Function
    End If
     
    p = 0
    While p < AcroPDDoc.GetNumPages And Search_PDF = False
        Set AcroHiliteList = CreateObject("AcroExch.HiliteList")
        If AcroHiliteList.Add(0, 9999) Then
            Set AcroPDPage = AcroPDDoc.AcquirePage(p)
            Set AcroTextSelect = AcroPDPage.CreatePageHilite(AcroHiliteList)
            If Not AcroTextSelect Is Nothing Then
                i = 0
                While i < AcroTextSelect.GetNumText And Search_PDF = False
                    If InStr(1, AcroTextSelect.GetText(i), findText, vbTextCompare) Then Search_PDF = True
                    i = i + 1
                Wend
            End If
        End If
        p = p + 1
    Wend
       
    AcroPDDoc.Close
     
End Function
 
Upvote 0
This macro uses the Acrobat API to search the file contents.

This code is awesome. I have found that ALL of these files have owner permissions (Require a password to edit or highlight.)
I do have the password, and it is the same password for all of the files.
Currently this VBA will give a false regardless, because it can't highlight anything.
If the Owner Permissions password was "FOOEY" where would I add that bit in, so the file could be edited?[/QUOTE]
 
Upvote 0
Nevermind....I was able to find an old Action for Acrobat that works with the new version.
I was able to remove the password (owner permissions) and run the vba with great success. In case anyone was wondering, 6 bad files out of 3800.

Thank you both for all of your help.
 
Upvote 0
You've found a working solution, however see if the following helps.

This code is awesome. I have found that ALL of these files have owner permissions (Require a password to edit or highlight.)
I do have the password, and it is the same password for all of the files.
Currently this VBA will give a false regardless, because it can't highlight anything.
If the Owner Permissions password was "FOOEY" where would I add that bit in, so the file could be edited?
You are correct that the code does something similar to a select all with the AcroPDPage.CreatePageHilite line, but because the password prevents editing, the
AcroTextSelect.GetNumText returns 0 (the number of text strings highlighted) and it therefore can't find the specified text.

I don't think it is possible to enter the password or disable it temporarily with the Acrobat API, however I have found a much simpler method using AVDoc.FindText which works whether or not the PDF has a password to restrict editing.

Replace the Search_PDF function with this code:
VBA Code:
Private Function Search_PDF(PDFfullName As String, TextToFind As String) As Boolean
  
    Dim AcroPDDoc As Object
    Dim AcroAVDoc As Object
  
    Set AcroPDDoc = CreateObject("AcroExch.PDDoc")
    If Not AcroPDDoc.Open(PDFfullName) Then
        Search_PDF = False
        MsgBox "Unable to open " & PDFfullName
    Else
        Set AcroAVDoc = AcroPDDoc.OpenAVDoc("")
        '            AcroAVDoc.FindText(szText As String, bCaseSensitive As Long, bWholeWordsOnly As Long, bReset As Long) As Boolean
        Search_PDF = AcroAVDoc.FindText(TextToFind, 0, 1, 1)
        AcroAVDoc.Close bNoSave:=True
    End If
  
End Function
 
Upvote 0

Forum statistics

Threads
1,215,453
Messages
6,124,918
Members
449,195
Latest member
Stevenciu

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top