VBA Page Count

krishhi

Active Member
Joined
Sep 8, 2008
Messages
328
Hello Everyone,

I have nearly 1500 Pdf files, Each have its own name. So, I have to Open the pdf and List the page numbers in Excel. So, I need a macro code to open the pdf and count the page numbers and post it into excel.

Can i Get Any Help from you Guys,
Krish.
 

Excel Facts

Best way to learn Power Query?
Read M is for (Data) Monkey book by Ken Puls and Miguel Escobar. It is the complete guide to Power Query.
Interesting problem I looked around a little and found some code at another site. The link is http://www.a1vbcode.com/snippet-3857.asp it's Visual Basic code but it seemed be inline with what you want although I did have some problems. I looped through it few times and it got the number of pages plus an extra digit or two. Example the pdf had 418 pages returned 4183 or 15 page returned 1511. So it might need to be modified a little to work correctly/ at all.

Hope someone can provide you with a better solution but I thought I'd give you a place start. I'll try to do some debugging on it when I get some time.
 
Upvote 0
I rewrote the code that I linked to before and I while I don't have as large of a pdf sample as you do I tested it on few dozen and the result were good.
Code:
Sub PDFandNumPages()
   
   Dim Folder As Object
   Dim file As Object
   Dim fso As Object
   Dim iExtLen As Integer, iRow As Integer
   Dim sFolder As String, sExt As String
   Dim sPDFName As String

   sExt = "pdf"
   iExtLen = Len(sExt)
   iRow = 1
   ' Must have a '\' at the end of path
   sFolder = "C:\pdf_Directory\"
   
   Set fso = CreateObject("Scripting.FileSystemObject")
   
   If sFolder <> "" Then
      Set Folder = fso.GetFolder(sFolder)
      For Each file In Folder.Files
         If Right(file, iExtLen) = sExt Then
            Cells(iRow, 1).Value = file.Name
            Cells(iRow, 2).Value = pageCount(sFolder & file.Name)
            iRow = iRow + 1
         End If
      Next file
   End If

End Sub
The below code is what gets the number of pages.
Code:
Function pageCount(sFilePathName As String) As Integer

Dim nFileNum As Integer
Dim sInput As String
Dim sNumPages As String
Dim iPosN1 As Integer, iPosN2 As Integer
Dim iPosCount1 As Integer, iPosCount2 As Integer
Dim iEndsearch As Integer

' Get an available file number from the system
nFileNum = FreeFile

'OPEN the PDF file in Binary mode
Open sFilePathName For Binary Lock Read Write As #nFileNum
  
  ' Get the data from the file
  Do Until EOF(nFileNum)
      Input #1, sInput
      sInput = UCase(sInput)
      iPosN1 = InStr(1, sInput, "/N ") + 3
      iPosN2 = InStr(iPosN1, sInput, "/")
      iPosCount1 = InStr(1, sInput, "/COUNT ") + 7
      iPosCount2 = InStr(iPosCount1, sInput, "/")
      
   If iPosN1 > 3 Then
      sNumPages = Mid(sInput, iPosN1, iPosN2 - iPosN1)
      Exit Do
   ElseIf iPosCount1 > 7 Then
      sNumPages = Mid(sInput, iPosCount1, iPosCount2 - iPosCount1)
      Exit Do
   ' Prevent overflow and assigns 0 to number of pages if strings are not in binary
   ElseIf iEndsearch > 1001 Then
      sNumPages = "0"
      Exit Do
   End If
      iEndsearch = iEndsearch + 1
   Loop
   
  ' Close pdf file
  Close #nFileNum
  pageCount = CInt(sNumPages)
  
End Function
The first code search a directory for pdf's but that sub is just one way to do it. If the pdfs are in many different subdirectory I would suggest Chip Pearson's Directory Tree add-in it can generate files with path for all file types or just the extensions you choose.

Best of luck.
 
Upvote 0
This is another alternative for getting page counts of PDF files;

Code:
Sub Test()
    Dim MyPath As String, MyFile As String
    Dim i As Long
    MyPath = "C:\TestFolder"
    MyFile = Dir(MyPath & Application.PathSeparator & "*.pdf", vbDirectory)
    Range("A:B").ClearContents
    Range("A1") = "File Name": Range("B1") = "Pages"
    Range("A1:B1").Font.Bold = True
    i = 1
    Do While MyFile <> ""
        i = i + 1
        Cells(i, 1) = MyFile
        Cells(i, 2) = GetPageNum(MyPath & Application.PathSeparator & MyFile)
        MyFile = Dir
    Loop
    Columns("A:B").AutoFit
    MsgBox "Total of " & i - 1 & " PDF files have been found" & vbCrLf _
           & " File names and corresponding count of pages have been written on " _
           & ActiveSheet.Name, vbInformation, "Report..."
End Sub
'
Function GetPageNum(PDF_File As String)
    'Haluk 19/10/2008
    Dim FileNum As Long
    Dim strRetVal As String
    Dim RegExp
    Set RegExp = CreateObject("VBscript.RegExp")
    RegExp.Global = True
    RegExp.Pattern = "/Type\s*/Page[^s]"
    FileNum = FreeFile
    Open PDF_File For Binary As #FileNum
        strRetVal = Space(LOF(FileNum))
        Get #FileNum, , strRetVal
    Close #FileNum
    GetPageNum = RegExp.Execute(strRetVal).Count
End Function
 
Last edited:
Upvote 0
I just checked Haluk's solution against mine and the results didn't match for about a fourth of pdf checked. I manually checked some of the discrepancies with the pdf files themselves.

Sample size 4344 pdfs
Two function returned same value on: 3009 pdfs
Does not include pdf's with no result.
My function:
No results: 362 pdfs
Error on: 3 pdfs
The overall size of my over and under were larger. Bad results on pdfs that are part of assemblies.

Haluk's function:
No Results: 0
Error on: 1 pdf
Sometimes returned 2 or 3 times the number of pages, exactly 2 or 3 times ie 250 returned 750. Sometimes 1 page greater than actual count.

I would use both functions and mannual check the discrepencies.
 
Upvote 0
Re:VBA - working with PDFs

Hi,

I have 20 PDFs which need to be categorized into different folders and also to be renamed with a new name in the excel sheet.

<table frame="VOID" rules="NONE" border="0" cellspacing="0" cols="3"> <colgroup><col width="86"><col width="86"><col width="86"></colgroup> <tbody> <tr> <td style="border-top: 1px solid #000000; border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000" height="47" width="86" align="LEFT">PDF Name(includes the path) </td> <td style="border-top: 1px solid #000000; border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000" width="86" align="LEFT">New folder </td> <td style="border-top: 1px solid #000000; border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000" width="86" align="LEFT">New name </td> </tr> <tr> <td style="border-top: 1px solid #000000; border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000" height="17" align="LEFT">c:\xyz.pdf </td> <td style="border-top: 1px solid #000000; border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000" align="LEFT">d:\pdf1\ </td> <td style="border-top: 1px solid #000000; border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000" align="LEFT">abc.pdf </td> </tr> <tr> <td style="border-top: 1px solid #000000; border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000" height="17" align="LEFT">a:\pqr.pdf </td> <td style="border-top: 1px solid #000000; border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000" align="LEFT">d:\pdf2\ </td> <td style="border-top: 1px solid #000000; border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000" align="LEFT">efg.pdf </td> </tr> </tbody> </table>




and so on upto 20.

Can this be done using a VBA code ?
 
Last edited:
Upvote 0
Re: VBA - working with PDFs

I've checked Haluk's function and Ralajer's function, both of them sometimes return 0 for some pdfs those are samples contains PDF-1.3, PDF-1.4, PDF-1.5, PDF-1.6 and PDF-1.7.

Then, I've modified Halu's function, that returns right page number, as far as I checked with pdfs those I have or I accessed.

Code:
' Seiji Fujita  April 13, 2016 (base code: 'Haluk 19/10/2008)
Function GetPageNum(PDF_File As String) As Long
    Dim FileNum As Long
    Dim strRetVal As String
    Dim RegExp
    Dim getpage0 As Long, getpage1 As Long, getpage2 As Long, getpage3 As Long

    Set RegExp = CreateObject("VBscript.RegExp")
    RegExp.Global = True
    RegExp.Pattern = "/Type\s*/Page[^s]"

    FileNum = FreeFile
    Open PDF_File For Binary As #FileNum
        strRetVal = Space(LOF(FileNum))
        Get #FileNum, , strRetVal
    Close #FileNum
    
    RegExp.Pattern = "/Type\s*/Page[^s]"
    getpage0 = RegExp.Execute(strRetVal).Count

    RegExp.Pattern = "/Resources"
    getpage1 = RegExp.Execute(strRetVal).Count

    RegExp.Pattern = "/ProcSet\s*\[/PDF"
    getpage2 = RegExp.Execute(strRetVal).Count

    RegExp.Pattern = "/Type/Catalog/Page\s*"
    getpage3 = RegExp.Execute(strRetVal).Count

    If getpage0 > 0 Then
        GetPageNum = getpage0
    Else
        GetPageNum = MathMax(MathMax(getpage1, getpage2), getpage3)
        ' you can use below with MS Excel, insted above
        ' GetPageNum = Application.WorksheetFunction.Max(getpage1, getpage2, getpage3)
    End If
End Function

Function MathMax(ByVal a As Long, ByVal b As Long) As Long
    If a >= b Then
        MathMax = a
    Else
        MathMax = b
    End If
End Function
 
Upvote 0
Re: VBA - working with PDFs

Sorry, I repost this to revise code (remove unnecessary code and so on).

Code:
' Seiji Fujita  April 13, 2016 (base code: 'Haluk 19/10/2008)
Function GetPageNum(ByVal PDF_File As String) As Long
    Dim FileNum As Long
    Dim strRetVal As String
    Dim RegExp
    Dim getpage0 As Long, getpage1 As Long, getpage2 As Long, getpage3 As Long

    FileNum = FreeFile
    Open PDF_File For Binary As #FileNum
        strRetVal = Space(LOF(FileNum))
        Get #FileNum, , strRetVal
    Close #FileNum

    Set RegExp = CreateObject("VBscript.RegExp")
    RegExp.Global = True

    RegExp.Pattern = "/Type\s*/Page[^s]"
    getpage0 = RegExp.Execute(strRetVal).Count

    RegExp.Pattern = "/Resources"
    getpage1 = RegExp.Execute(strRetVal).Count

    RegExp.Pattern = "/ProcSet\s*\[/PDF"
    getpage2 = RegExp.Execute(strRetVal).Count

    RegExp.Pattern = "/Type/Catalog/Page\s*"
    getpage3 = RegExp.Execute(strRetVal).Count

    If getpage0 > 0 Then
        GetPageNum = getpage0
    Else
        GetPageNum = MathMax(MathMax(getpage1, getpage2), getpage3)
        ' you can use below with MS Excel, insted above
        ' GetPageNum = Application.WorksheetFunction.Max(getpage1, getpage2, getpage3)
    End If
End Function

Function MathMax(ByVal a As Long, ByVal b As Long) As Long
    If a >= b Then
        MathMax = a
    Else
        MathMax = b
    End If
End Function
 
Upvote 0
Re: VBA - working with PDFs

I found my function got error on some pdf that needs password to browse or protected (/inhibit) edit etc. When I count page to those pdf, result is greater than actual count or is zero.

I cannot fix this problem perfectly at the moment, but I temporarily revise my function to correct error as 1 page grearer count.

Code:
' Seiji Fujita  rev.1  April 21, 2016 (base code: 'Haluk 19/10/2008)
Function GetPageNum(ByVal PDF_File As String) As Long
    Dim FileNum As Long
    Dim strRetVal As String
    Dim RegExp
    Dim getpage0 As Long, getpage1 As Long, getpage2 As Long, _
        getpage3 As Long, getpage4 As Long

    FileNum = FreeFile
    Open PDF_File For Binary As #FileNum
        strRetVal = Space(LOF(FileNum))
        Get #FileNum, , strRetVal
    Close #FileNum

    Set RegExp = CreateObject("VBscript.RegExp")
    RegExp.Global = True

    RegExp.Pattern = "/Type\s*/Page[^s]"
    getpage0 = RegExp.Execute(strRetVal).Count

    RegExp.Pattern = "/Resources"
    getpage1 = RegExp.Execute(strRetVal).Count

    RegExp.Pattern = "/ProcSet\s*\[/PDF"
    getpage2 = RegExp.Execute(strRetVal).Count

    RegExp.Pattern = "/Type/Catalog/Page\s*"
    getpage3 = RegExp.Execute(strRetVal).Count

    RegExp.Pattern = "/ExtGState"
    getpage4 = RegExp.Execute(strRetVal).Count

    If getpage0 > 0 Then
        GetPageNum = getpage0
    Else
        GetPageNum = MathMax(MathMax(getpage1, getpage2), getpage3)
        ' you can use below with MS Excel, insted above
        ' GetPageNum = Application.WorksheetFunction.Max(getpage1, getpage2, getpage3)
    End If
    If GetPageNum > getpage4 Then
        GetPageNum = GetPageNum - getpage4
    End If
End Function

Function MathMax(ByVal a As Long, ByVal b As Long) As Long
    If a >= b Then
        MathMax = a
    Else
        MathMax = b
    End If
End Function
 
Upvote 0

Forum statistics

Threads
1,214,808
Messages
6,121,684
Members
449,048
Latest member
81jamesacct

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top