Open PDF from listed in column A, search for text in PDF that is in adjacent cell in Column B, and return a TRUE/FALSE Value

oreo haven

Board Regular
Joined
May 15, 2008
Messages
65
All,
I have a problem where several invoices may have the wrong information written into them. As long as the number in Column B is inside the PDF listed in Column A, I know that it is ok.
So I would like to search the file listed in column A and see if the strin listed in the corresponding row in Column B is found in it. If so, I just need a "True" to be written to the corresponding Cell in Column C.
I have 3800 invoices to check, right now. Depending on the results, I could end up with 30k more files to check.
I do have Acrobat Pro and have enabled the libraries.
Has anyone here run into this before? The table below is what I would prefer to have in the end.

FULL_PATH TO PDF
STRING_TO_FINDIs_It_There
c:\users\jsd\documents\invoices\18996_58720_INV767056.pdf767056
TRUE​
c:\users\jsd\documents\invoices\18996_58728_INV767057.pdf767057
TRUE​
c:\users\jsd\documents\invoices\18996_58736_INV767058.pdf767058
TRUE​
c:\users\jsd\documents\invoices\18996_58744_INV767059.pdf767059
FALSE​
c:\users\jsd\documents\invoices\18996_58752_INV767060.pdf767060
TRUE​
c:\users\jsd\documents\invoices\18996_58760_INV767065.pdf767065
TRUE​
c:\users\jsd\documents\invoices\18996_58768_INV767066.pdf767066
FALSE​
c:\users\jsd\documents\invoices\18996_58776_INV767067.pdf767067
FALSE​
c:\users\jsd\documents\invoices\18996_58784_INV767069.pdf767069
TRUE​
 

Excel Facts

Who is Mr Spreadsheet?
Author John Walkenbach was Mr Spreadsheet until his retirement in June 2019.
maybe
Result
String_To_FindNameTBDExist
76705618996_58720_INV767056.pdf767056767056
76705718996_58728_INV767057.pdf767057767057
76705818996_58736_INV767058.pdf767058767058
76705918996_58744_INV767059.pdf767059767059
76706018996_58752_INV767060.pdf767060767060
76706518996_58752_INV767061.pdf767061
76706618996_58760_INV767065.pdf767065767065
76706718996_58768_INV767066.pdf767066767066
76706918996_58776_INV767067.pdf767067767067
18996_58776_INV767068.pdf767068
18996_58784_INV767069.pdf767069767069

Code:
// invoices
let
    Source = Folder.Files("D:\test\oreo\invoices"),
    TSC = Table.SelectColumns(Source,{"Name"}),
    TBD = Table.AddColumn(TSC, "TBD", each Text.BetweenDelimiters([Name], "INV", ".", {0, RelativePosition.FromEnd}, 0), type text),
    Merge = Table.NestedJoin(TBD,{"TBD"},STF,{"String_To_Find"},"STF",JoinKind.LeftOuter),
    Expand = Table.ExpandTableColumn(Merge, "STF", {"String_To_Find"}, {"Exist"})
in
    Expand
Code:
// STF
let
    Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
    Type = Table.TransformColumnTypes(Source,{{"String_To_Find", type text}})
in
    Type
you'll need to change this D:\test\oreo\invoices to your original path
 
Last edited:
Upvote 0
If a moderator sees this, can it be moved to the regular Excel Forum? I actually have no idea how I posted in BI.
 
Upvote 0
Trying this again. Somehow I was in the BI side of the forum and posted there, which was a miserable oversight on my part.
All,
I have a problem where several invoices may have the wrong information written into them. As long as the number in Column B is inside the PDF listed in Column A, I know that it is ok.
So I would like to search the file listed in column A and see if the string listed in the corresponding row in Column B is found in it. If so, I just need a "True" to be written to the corresponding Cell in Column C.
I have 3800 invoices to check, right now. Depending on the results, I could end up with 30k more files to check.
I do have Acrobat Pro and have enabled the libraries.
Has anyone here run into this before? The table below is what I would prefer to have in the end.
I have seen different code doing this. I'm guessing it would be best to hard code my path, and just have the filename in column A. That is easy enough to do.

path and filenamestring to findIs_It_There
c:\users\jsd\documents\invoices\18996_58720_INV767056.pdf767056TRUE
c:\users\jsd\documents\invoices\18996_58728_INV767057.pdf767057TRUE
c:\users\jsd\documents\invoices\18996_58736_INV767058.pdf767058TRUE
c:\users\jsd\documents\invoices\18996_58744_INV767059.pdf767059FALSE
c:\users\jsd\documents\invoices\18996_58752_INV767060.pdf767060TRUE
c:\users\jsd\documents\invoices\18996_58760_INV767065.pdf767065TRUE
c:\users\jsd\documents\invoices\18996_58768_INV767066.pdf767066FALSE
c:\users\jsd\documents\invoices\18996_58776_INV767067.pdf767067FALSE
c:\users\jsd\documents\invoices\18996_58784_INV767069.pdf767069TRUE
 
Upvote 0
Duplicate
Because this question is in two different threads, please either delete this one and move the one I erroneously placed in the BI forum, or delete the one in the BI forum.
 
Upvote 0

Forum statistics

Threads
1,214,944
Messages
6,122,392
Members
449,081
Latest member
JAMES KECULAH

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top