ExcelChampion
Well-known Member
- Joined
- Aug 12, 2005
- Messages
- 976
I have the std version of Adobe Acrobat. I found some code and have manipulated to do what I need - almost.
It converts the PDF to txt, but doesn't extract it as if I were to open Adobe and save the file as txt. It saves it as if I opened the PDF with Excel - gobbledygook. I'd like to be able to save the data in the PDF as txt so that I can open it with Excel and by brute-force, parse it out.
Anyone know a place to find the Acrobat object model or have a clue to do what I need? I suspect this isn't possible based on the search results I've looked through on Google and this site...
It converts the PDF to txt, but doesn't extract it as if I were to open Adobe and save the file as txt. It saves it as if I opened the PDF with Excel - gobbledygook. I'd like to be able to save the data in the PDF as txt so that I can open it with Excel and by brute-force, parse it out.
Anyone know a place to find the Acrobat object model or have a clue to do what I need? I suspect this isn't possible based on the search results I've looked through on Google and this site...
Code:
Sub Import_PDF()
Dim AcroXApp As Object
Dim AcroXAVDoc As Object
Dim AcroXPDDoc As Object
startTime = Time
PDF_PATH = ThisWorkbook.Path & "\"
myFile = Dir(PDF_PATH, vbDirectory)
OUTPUT_PATH = ThisWorkbook.Path & "\Solicitations\"
Do While myFile <> ""
If Right(myFile, 3) = "pdf" Then
Set AcroXApp = CreateObject("AcroExch.App")
AcroXApp.Hide
Set AcroXAVDoc = CreateObject("AcroExch.AVDoc")
AcroXAVDoc.Open PDF_PATH & myFile, "Acrobat"
AcroXAVDoc.BringToFront
Set AcroXPDDoc = AcroXAVDoc.GetPDDoc
Dim jsObj As Object
Set jsObj = AcroXPDDoc.GetJSObject
OutputFile = myFile & ".txt"
jsObj.SaveAs OUTPUT_PATH & OutputFile, "com.adobe.acrobat.plain-text"
End If
myFile = Dir
Loop
AcroXAVDoc.Close False
AcroXApp.Hide
AcroXApp.Exit
endTime = Time
MsgBox "Finished importing" & Chr(10) & "Run Time: " & Format(endTime - startTime, "hh:mm:ss"), vbInformation
End Sub