Search and extract text from PDF to XL

vbacoder

Active Member
Joined
Jul 7, 2007
Messages
354
Hi everyone,

I'm looking for a method (in VBA) to open a PDF file from a hyperlink and search for text in the first few rows of the file and then return these to the worksheet. I have a very long list of files to go through and I really nees to automate this.

Thank you for any assistance.

vcoder
 

Excel Facts

How to total the visible cells?
From the first blank cell below a filtered data set, press Alt+=. Instead of SUM, you will get SUBTOTAL(9,)
I've had moderate success doing this from Access using PDF-Analyzer but haven't tried from Excel.

It's a free download from http://www.pdf-analyzer.com/. You might want to try the DLL and the EXE and decide which works best for you.
 
Upvote 0
Hello,

Many thanks for your response Ruddles. I would prefer to create something in XL VBA as it is transportable, whereas the add-in needs to be installed.

I can take care of file opening and closing, I just need a function to either conduct a ctrl+F search in the PDF to find text and then copy it to a variable or some similar search process to locate words, copy then and extract.

vcoder
 
Upvote 0
I guess it may need some sort of shell script to execute commands within the PDF file. So maybe a VBA function that calls an external script?
 
Upvote 0
I guess you would have to start an instance of Adobe Reader from a worksheet, use SendKeys to search for your required text, copy it to the Windows clipboard and from there extract it into a variable or paste it into the worksheet. I've never tried that approach. SendKeys is notoriously flaky and I'm not aware that you can drive Adobe Reader from Excel in the same way as you can with Access, Word or Outlook, for example..

The way I tackled it was to convert the PDF (or the parts of it I was interested in) into plain text using an external program and then process the plain text which is easy-peasy.
 
Upvote 0
Thanks for your further input Ruddles.

Converting to text sounds better, but I would prefer to do the following, via an XL VBA macro:

Open PDF at hyperlink
Search for and extract matched key words
Copy extracted words
Paste extracted words in XL sheet

If anyone has tried and succeeded in doing similar, please advise.

Many thanks,

vcoder
 
Upvote 0
Do you have only a pdf reader or full-blown adobe acrobat? It's possible to get at pdf documents using acrobat's objects. I'm not sure if it can be done with reader. I think it's a bit of a pain though - I've got the t-shirt so I'm done with it.

My results were posted here but later I changed the actual code I used (and there's a link in my post to another example as well). Rather than "searching" the document it was easier to load it page by page and search the text word by word. Took me a darned four hours to figure this out - I did need acrobat to do it, not just reader. Also, I also don't know if working in a file opened through a browser makes any difference or not (I wasn't doing so). Creating a Javascript object was helpful -- dimming an object and calling the getJSObject method from acrobat. Acrobat provides some good tools for working with pdf's through this interface. But you get no intellisense doing this in vba {sigh}. Some resources on this latter are here:
http://www.adobe.com/devnet/acrobat/javascript.html

I really don't know if this is any better than the analyzer Ruddles suggested or not - it was a lot of work for me but I was starting from scratch at the time. A utility could be very helpful - and may be necessary if you only have reader. I found it wasn't easy to rely on searching even in a local file opened with the acrobat program itself, using its own search methods, so I think sendkeys would just be a mess.
 
Upvote 0
Hi Xenou, Ruddles

Thank you both very much for your posts and links!

Best regards,

vcoder
 
Upvote 0

Forum statistics

Threads
1,214,833
Messages
6,121,857
Members
449,051
Latest member
excelquestion515

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top