Spliting pdf file by Mb size

prati

Board Regular
Joined
Jan 25, 2021
Messages
51
Office Version
  1. 2019
Platform
  1. Windows
Hey friends,
I wonder if there is a way to split pdf file through VBA into multiple pdf files so that each part will not be larger than 8Mb

Im using PDFtk server to split pdf file by defining the exact rang of pages. It splits the file into multiple files but doesn't take into account the file size

1621269802959.png

if FileLen(File1) / 1000000 > 8 then 'this check if the file is larger than 8mb
if totalPages < 100 then 'check the total page numbers
Wsh.Run ("cmd /c PDFtk " & "C:\Temp\oldfile.pdf" & " cat 1-50 output C:\Temp\newfilePart1.pdf"""), 0, True
Wsh.Run ("cmd /c PDFtk " & "C:\Temp\oldfile.pdf" & " cat 51-end output C:\Temp\newfilePart2.pdf"""), 0, True
Elseif totalPages < 150 then
Wsh.Run ("cmd /c PDFtk " & "C:\Temp\oldfile.pdf" & " cat 1-50 output C:\Temp\newfilePart1.pdf"""), 0, True
Wsh.Run ("cmd /c PDFtk " & "C:\Temp\oldfile.pdf" & " cat 51-100 output C:\Temp\newfilePart2.pdf"""), 0, True
Wsh.Run ("cmd /c PDFtk " & "C:\Temp\oldfile.pdf" & " cat 101-end output C:\Temp\newfilePart3.pdf"""), 0, True
Elseif totalPages < 200 then
....
Elseif totalPages < 250 then
...
and so on.

The code above dosen't help me because it split the files by number of pages and not by the size of the file as i want it to..

I found a program caled UnityPdf - it is a free program that can do the job easily, but i don't know if there is a way to write a vba command using UnityPdf in order to split the pdf file

View attachment 38890
 
The Debug output in the Immediate window should show if it is skipping files. DOS commands have a maximum length of 8191 characters, so if the folder path is long this limit could be exceeded because the code repeats the folder path for every _Page_n.pdf file. If the limit is being exceeded the code can be adjusted to fix this.

What is the size of the input PDF and how many pages does it have?
Hey Master,

The problem stopped and doesn't occur anymore.
I have tried several files with total pages of 500-600 and 40-50 Mb.
Somehow everything works great and the files are splitted to parts and all the separate pages are deleted as well.

I think the problem solved after restarting my pc......
Does it make sense?

Thank you
You are so brilliant,smart,genius
 
Upvote 0

Excel Facts

Format cells as date
Select range and press Ctrl+Shift+3 to format cells as date. (Shift 3 is the # sign which sort of looks like a small calendar).
I'm glad it works for you - thanks for your kind words.

Here's an improved version which deletes the _Part_nnn.pdf and _Page_nnn.pdf files at the start and deletes all the _Page_nnn.pdf files at the end, rather than the set of _Page_nnn.pdf files for each catenated Part. It also sets the input folder (using the DOS CD /D "C:\path\to\" command) once per Part, rather than including it for every _Page_nnn.pdf file, which results in much shorter command lines.
VBA Code:
Option Explicit

Const Q As String = """"

Public Sub PDFtk_Split_PDF_By_Size()

    Dim Wsh As Object 'WshShell
    Dim command As String
    Dim PDFinputFile As String, PDFfolder As String
    Dim maxFileSizeKB As Long
    Dim pageFile As String
    Dim page As Long
    Dim totalFileSizeKB As Single, thisFileSizeKB As Single
    Dim pageFiles As String
    Dim part As Long
    
    'PDF file to be split into multiple parts
    
    PDFinputFile = "C:\path\to\file.pdf"
    
    'Maximum size of each part in kilobytes
    
    maxFileSizeKB = 2048
    
    Set Wsh = CreateObject("WScript.Shell")  'New WshShell
    
    If Dir(PDFinputFile) <> vbNullString Then
    
        PDFfolder = Left(PDFinputFile, InStrRev(PDFinputFile, "\"))
        
        'Delete existing _Page_nnn.pdf and _Part_nnn.pdf files for the input file
        
        command = "cmd /c DEL " & Q & Replace(PDFinputFile, ".pdf", "_Page_*.pdf") & Q
        Debug.Print Time; command
        Wsh.Run command, 0, True
        command = "cmd /c DEL " & Q & Replace(PDFinputFile, ".pdf", "_Part_*.pdf") & Q
        Debug.Print Time; command
        Wsh.Run command, 0, True

        'Run PDFtk burst command to create multiple _Page_nnn.pdf files, one for each page in the input PDF

        command = "cmd /c PDFtk " & Q & PDFinputFile & Q & " burst output " & Q & Replace(PDFinputFile, ".pdf", "_Page_%03d.pdf") & Q
        Debug.Print Time; command
        Wsh.Run command, 0, True
        
        'Loop through the _Page_nnn.pdf files in order and create _Part_nnn.pdf files whose size is less than the maximum file size.
        
        totalFileSizeKB = 0
        pageFiles = ""
        page = 0
        part = 0
        Do
            page = page + 1
            'Get the next _Page_nnn.pdf file
            pageFile = Dir(Replace(PDFinputFile, ".pdf", "_Page_" & Format(page, "000") & ".pdf"))
            If pageFile <> vbNullString Then
                thisFileSizeKB = FileLen(PDFfolder & pageFile) / 1024
                'Is this PDF page file size plus the current total file size less than the maximum file size?
                If totalFileSizeKB + thisFileSizeKB <= maxFileSizeKB Then
                    'Yes, so add this PDF page file to the string of files and increment the current total file size
                    pageFiles = pageFiles & Q & pageFile & Q & " "
                    totalFileSizeKB = totalFileSizeKB + thisFileSizeKB
                Else
                    'No, so run PDFtk cat command to catenate the current PDF page files to the next PDF file named _Part_nnn.pdf
                    part = part + 1
                    command = "cmd /c CD /D " & Q & PDFfolder & Q & " & PDFtk " & pageFiles & "cat output " & Q & Replace(PDFinputFile, ".pdf", "_Part_" & Format(part, "000") & ".pdf") & Q
                    Debug.Print Time; command
                    Wsh.Run command, 0, True
                    'Initialise the PDF page files with this PDF file and the total file size
                    pageFiles = Q & pageFile & Q & " "
                    totalFileSizeKB = thisFileSizeKB
                End If
            End If
        Loop Until pageFile = vbNullString
        
        'If the current PDF page files isn't empty then run PDFtk cat command to catenate them to the next PDF file named _Part_nnn.pdf
        
        If pageFiles <> "" Then
            part = part + 1
            command = "cmd /c CD /D " & Q & PDFfolder & Q & " & PDFtk " & pageFiles & "cat output " & Q & Replace(PDFinputFile, ".pdf", "_Part_" & Format(part, "000") & ".pdf") & Q
            Debug.Print Time; command
            Wsh.Run command, 0, True
        End If
        
        'Delete all _Page_nnn.pdf files for the input file
        
        command = "cmd /c DEL " & Q & Replace(PDFinputFile, ".pdf", "_Page_*.pdf") & Q
        Debug.Print Time; command
        Wsh.Run command, 0, True
        
        'Delete doc_data.txt file created by PDFtk burst command
        
        If Dir(PDFfolder & "doc_data.txt") <> vbNullString Then Kill PDFfolder & "doc_data.txt"
        
        MsgBox "Done"
                    
    Else
    
        MsgBox "Error opening PDF file " & PDFinputFile
    
    End If
    
End Sub
 
Upvote 0
Solution
I'm glad it works for you - thanks for your kind words.

Here's an improved version which deletes the _Part_nnn.pdf and _Page_nnn.pdf files at the start and deletes all the _Page_nnn.pdf files at the end, rather than the set of _Page_nnn.pdf files for each catenated Part. It also sets the input folder (using the DOS CD /D "C:\path\to\" command) once per Part, rather than including it for every _Page_nnn.pdf file, which results in much shorter command lines.
VBA Code:
Option Explicit

Const Q As String = """"

Public Sub PDFtk_Split_PDF_By_Size()

    Dim Wsh As Object 'WshShell
    Dim command As String
    Dim PDFinputFile As String, PDFfolder As String
    Dim maxFileSizeKB As Long
    Dim pageFile As String
    Dim page As Long
    Dim totalFileSizeKB As Single, thisFileSizeKB As Single
    Dim pageFiles As String
    Dim part As Long
   
    'PDF file to be split into multiple parts
   
    PDFinputFile = "C:\path\to\file.pdf"
   
    'Maximum size of each part in kilobytes
   
    maxFileSizeKB = 2048
   
    Set Wsh = CreateObject("WScript.Shell")  'New WshShell
   
    If Dir(PDFinputFile) <> vbNullString Then
   
        PDFfolder = Left(PDFinputFile, InStrRev(PDFinputFile, "\"))
       
        'Delete existing _Page_nnn.pdf and _Part_nnn.pdf files for the input file
       
        command = "cmd /c DEL " & Q & Replace(PDFinputFile, ".pdf", "_Page_*.pdf") & Q
        Debug.Print Time; command
        Wsh.Run command, 0, True
        command = "cmd /c DEL " & Q & Replace(PDFinputFile, ".pdf", "_Part_*.pdf") & Q
        Debug.Print Time; command
        Wsh.Run command, 0, True

        'Run PDFtk burst command to create multiple _Page_nnn.pdf files, one for each page in the input PDF

        command = "cmd /c PDFtk " & Q & PDFinputFile & Q & " burst output " & Q & Replace(PDFinputFile, ".pdf", "_Page_%03d.pdf") & Q
        Debug.Print Time; command
        Wsh.Run command, 0, True
       
        'Loop through the _Page_nnn.pdf files in order and create _Part_nnn.pdf files whose size is less than the maximum file size.
       
        totalFileSizeKB = 0
        pageFiles = ""
        page = 0
        part = 0
        Do
            page = page + 1
            'Get the next _Page_nnn.pdf file
            pageFile = Dir(Replace(PDFinputFile, ".pdf", "_Page_" & Format(page, "000") & ".pdf"))
            If pageFile <> vbNullString Then
                thisFileSizeKB = FileLen(PDFfolder & pageFile) / 1024
                'Is this PDF page file size plus the current total file size less than the maximum file size?
                If totalFileSizeKB + thisFileSizeKB <= maxFileSizeKB Then
                    'Yes, so add this PDF page file to the string of files and increment the current total file size
                    pageFiles = pageFiles & Q & pageFile & Q & " "
                    totalFileSizeKB = totalFileSizeKB + thisFileSizeKB
                Else
                    'No, so run PDFtk cat command to catenate the current PDF page files to the next PDF file named _Part_nnn.pdf
                    part = part + 1
                    command = "cmd /c CD /D " & Q & PDFfolder & Q & " & PDFtk " & pageFiles & "cat output " & Q & Replace(PDFinputFile, ".pdf", "_Part_" & Format(part, "000") & ".pdf") & Q
                    Debug.Print Time; command
                    Wsh.Run command, 0, True
                    'Initialise the PDF page files with this PDF file and the total file size
                    pageFiles = Q & pageFile & Q & " "
                    totalFileSizeKB = thisFileSizeKB
                End If
            End If
        Loop Until pageFile = vbNullString
       
        'If the current PDF page files isn't empty then run PDFtk cat command to catenate them to the next PDF file named _Part_nnn.pdf
       
        If pageFiles <> "" Then
            part = part + 1
            command = "cmd /c CD /D " & Q & PDFfolder & Q & " & PDFtk " & pageFiles & "cat output " & Q & Replace(PDFinputFile, ".pdf", "_Part_" & Format(part, "000") & ".pdf") & Q
            Debug.Print Time; command
            Wsh.Run command, 0, True
        End If
       
        'Delete all _Page_nnn.pdf files for the input file
       
        command = "cmd /c DEL " & Q & Replace(PDFinputFile, ".pdf", "_Page_*.pdf") & Q
        Debug.Print Time; command
        Wsh.Run command, 0, True
       
        'Delete doc_data.txt file created by PDFtk burst command
       
        If Dir(PDFfolder & "doc_data.txt") <> vbNullString Then Kill PDFfolder & "doc_data.txt"
       
        MsgBox "Done"
                   
    Else
   
        MsgBox "Error opening PDF file " & PDFinputFile
   
    End If
   
End Sub


Everything you write Is clearly a masterpiece of code.
 
Upvote 0
@John_w just curiosity I try split my file size about 32MB but nothing happens in the same folder contains the file , it just show the message after run the code MSGBOX " Done"

without split the files , what I missed, please ?
 
Upvote 0
@John_w just curiosity I try split my file size about 32MB but nothing happens in the same folder contains the file , it just show the message after run the code MSGBOX " Done"

without split the files , what I missed, please ?
Is the .pdf secured? If so, PDFtk burst needs the password to extract pages. Looking at the manual:
it seems you can supply the password with the input_pw option. First try it manually in a command window:
Code:
PDFtk "C:\path\to\file.pdf" input_pw "yourPassword" burst output "C:\path\to\file_Page_%03d.pdf"
and if that works modify the macro to include the password in the code.
 
Upvote 0
Is the .pdf secured? If so, PDFtk burst needs the password to extract pages. Looking at the manual:

absolutely not
 
Upvote 0
Run a manual burst command and see if an error is displayed:
Code:
PDFtk "C:\path\to\file.pdf" burst output "C:\path\to\file_Page_%03d.pdf"
 
Upvote 0
it directly open the file
I don't understand that (burst doesn't open the input .pdf file), and you aren't really giving enough information to determine what's happening. Examine the debug output to see what the code is doing.

There is a small bug which occurs if maxFileSizeKB is less than the size of a _Page_nnn.pdf file, fixed by changing the Else part of the main loop as follows:
VBA Code:
                Else
                    'No, so run PDFtk cat command to catenate the current PDF page files to the next PDF file named _Part_nnn.pdf
                    If pageFiles <> "" Then
                        part = part + 1
                        command = "cmd /c CD /D " & Q & PDFfolder & Q & " & PDFtk " & pageFiles & "cat output " & Q & Replace(PDFinputFile, ".pdf", "_Part_" & Format(part, "000") & ".pdf") & Q
                        Debug.Print Time; command
                        Wsh.Run command, 0, True
                    End If
                    'Initialise the PDF page files with this PDF file and the total file size
                    pageFiles = Q & pageFile & Q & " "
                    totalFileSizeKB = thisFileSizeKB
                End If
 
Upvote 0
sorry this is what I got
VBA Code:
cmd /c PDFtk "C:\Users\mcc\Desktop\ff\file.pdf" burst output "C:\Users\mcc\Desktop\ff\file_Page_%03d.pdf"
 
Upvote 0

Forum statistics

Threads
1,214,886
Messages
6,122,093
Members
449,064
Latest member
Danger_SF

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top