VBA Reading a PDF (information missing when reading the pdf)

fanofstuff

New Member
Joined
Feb 1, 2023
Messages
1
Office Version
  1. 365
Platform
  1. Windows
Hi, i was wondering if i could get some help with a problem i was having, once per month i need to download about 10-15 pdf files (updating them) and manually take data out of the files and add them to an excel spreadsheet.
I have adobe acrobat pro V 10.0 and have been trying to create a macro/vba code to automate at least the entering of numbers into the excel sheet
I was able to run some loops to cycle through the pdf. The issue is that negative numbers do not get printed/shown as negative numbers (as well as some words are printed with "..." instead of some letters but that is less important) and this is extremely important as this has to do with stock performance.
Almost everything in the pdf is in different tables, and there are 10 pages per pdf where only 3 are needed, i wanted to know if there was a way that i could avoid, downloading all 15 pdf's, then flattening them, then exporting them to excel, then using a macro on the excel sheet (if that is even possible)
This works for pdf files, however crucial information is ommited (even if i export to excel), when i flatten it, the info is no longer lost and i can export to excel but its incredibly messy and difficult to sort through as well as takes many extra steps/more time to do the task
Please help!!
*code*

Const pdf_file As String = "path_to_pdf"


Sub read_pdf()

Dim aApp As Acrobat.AcroApp
Dim av_doc As CAcroAVDoc
Dim pdf_doc As CAcroPDDoc
Dim sel_text As CAcroPDTextSelect
Dim i As Long, j As Long
Dim pagenumber, pagecontent, content

Set aApp = CreateObject("AcroExch.App")
Set av_doc = CreateObject("AcroExch.AVDoc")


If av_doc.Open(pdf_file, vbNull) <> True Then Exit Sub

While av_doc Is Nothing
Set av_doc = aApp.GetActiveDoc
Wend

Set pdf_doc = av_doc.GetPDDoc

For i = 0 To pdf_doc.GetNumPages - 1

Set pagenumber = pdf_doc.AcquirePage(i)
Set pagecontent = CreateObject("AcroExch.HiliteList")

On Error Resume Next
If pagecontent.Add(0, 9000) <> True Then Exit Sub
Set sel_text = pagenumber.CreatePageHilite(pagecontent)
On Error GoTo 0

For j = 0 To sel_text.GetNumText - 1
Debug.Print sel_text.GetText(j)

' //Range("A" & Rows.Count).End(x1Up).Offset(1, 0).Value = sel_text.GetText(j)
Next j


Next i

av_doc.Close False
aApp.Exit

Set pagenumber = Nothing
Set sel_text = Nothing
Set av_doc = Nothing
Set aApp = Nothing


End Sub
 

Excel Facts

Using Function Arguments with nested formulas
If writing INDEX in Func. Arguments, type MATCH(. Use the mouse to click inside MATCH in the formula bar. Dialog switches to MATCH.

Forum statistics

Threads
1,214,947
Messages
6,122,411
Members
449,081
Latest member
JAMES KECULAH

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top