Spliting pdf file by Mb size

prati

Board Regular
Joined
Jan 25, 2021
Messages
51
Office Version
  1. 2019
Platform
  1. Windows
Hey friends,
I wonder if there is a way to split pdf file through VBA into multiple pdf files so that each part will not be larger than 8Mb

Im using PDFtk server to split pdf file by defining the exact rang of pages. It splits the file into multiple files but doesn't take into account the file size

1621269802959.png

if FileLen(File1) / 1000000 > 8 then 'this check if the file is larger than 8mb
if totalPages < 100 then 'check the total page numbers
Wsh.Run ("cmd /c PDFtk " & "C:\Temp\oldfile.pdf" & " cat 1-50 output C:\Temp\newfilePart1.pdf"""), 0, True
Wsh.Run ("cmd /c PDFtk " & "C:\Temp\oldfile.pdf" & " cat 51-end output C:\Temp\newfilePart2.pdf"""), 0, True
Elseif totalPages < 150 then
Wsh.Run ("cmd /c PDFtk " & "C:\Temp\oldfile.pdf" & " cat 1-50 output C:\Temp\newfilePart1.pdf"""), 0, True
Wsh.Run ("cmd /c PDFtk " & "C:\Temp\oldfile.pdf" & " cat 51-100 output C:\Temp\newfilePart2.pdf"""), 0, True
Wsh.Run ("cmd /c PDFtk " & "C:\Temp\oldfile.pdf" & " cat 101-end output C:\Temp\newfilePart3.pdf"""), 0, True
Elseif totalPages < 200 then
....
Elseif totalPages < 250 then
...
and so on.

The code above dosen't help me because it split the files by number of pages and not by the size of the file as i want it to..

I found a program caled UnityPdf - it is a free program that can do the job easily, but i don't know if there is a way to write a vba command using UnityPdf in order to split the pdf file

View attachment 38890
 

Excel Facts

How to find 2nd largest value in a column?
MAX finds the largest value. =LARGE(A:A,2) will find the second largest. =SMALL(A:A,3) will find the third smallest
You could use PDFtk burst followed by several cat commands.
Hey,
When you say burst the file - do you mean split a pdf file that includes hundreds or thousand pages, split it to individual pages,
and after that to guess/check how many pages I need to merge with cat command in order to create several pdf files that are smaller than 8b?

For example if i have pdf with 2000 pages split it to
1.pdf
2.pdf
3.pdf
4
5
6
7
....
500.pdf
.....
....
2000.pdf

And then try to merge for example
pages 1 to 100
pages 101-400
pages 401-450
pages 451-2000....

If i understand, it will be a manual process, I will have to check or guess how many pages I have to merge, and It will not be automatic process?
 
Upvote 0
You could automate the whole process with VBA. PDFtk split produces multiple File n.pdf files, one per page in the original PDF. Then loop through those files using Dir() and get the size of each file using the FileLen function and construct a string of the File n.pdf files, up to your maximum file size, to use with PDFtk cat.
 
Upvote 0
You could automate the whole process with VBA. PDFtk split produces multiple File n.pdf files, one per page in the original PDF. Then loop through those files using Dir() and get the size of each file using the FileLen function and construct a string of the File n.pdf files, up to your maximum file size, to use with PDFtk cat.
 
Upvote 0
Thank You.
As I am still a begginer maybe I have a lot to learn before tring to do something in VBA.
The best I did is to split A.pdf fille located In C:\Temp into n.pages

Wsh.Run ("cmd /c PDFtk " & "C:\Temp\A.pdf" & " burst output C:\Temp\A_%01d.pdf"""), 0, True

And Then delete the A.pdf file and the txt file

Kill "C:\TempAnnexes\Temp\A.pdf"
Kill "C:\TempAnnexes\Temp\doc_data.txt"

Leave in C:\Temp only the n.pdf files like in the pictute below
1622315859960.png


From here I have no idea how to construct a string up to 8 mb and produce pdf files up to 8MB.....

In this example the srting you mentioned should produce 2 pdf files
part 1 should include pages A_1+A_2+A_3 = part1.pdf (together the total pages are less than 8MB)
I can do It manually
Wsh.Run ("cmd /c PDFtk " & "C:\Temp\A_1.pdf" & " C:\Temp\A_2.pdf" & " C:\Temp\A_3.pdf" & " cat output C:\Temp\part1.pdf"""), 0, True

part 2 should incude pages A_4+A_5+A_6 = part2.pdf (together the total pages are less than 8MB)
I can do It manually
Wsh.Run ("cmd /c PDFtk " & "C:\Temp\A_4.pdf" & " C:\Temp\A_5.pdf" & " C:\Temp\A_6.pdf" & " cat output C:\Temp\part2.pdf"""), 0, True

However the code I wrote Isn't smart at all.

I can Also manually create a lot of variables
L1 = FileLen("C:\Temp\A_1.pdf") / 1000000 'The size in Mb
L2 = FileLen("C:\Temp\A_2.pdf") / 1000000 'The size in Mb
L3 = FileLen("C:\Temp\A_3.pdf") / 1000000 'The size in Mb
L4 = FileLen("C:\Temp\A_4.pdf") / 1000000 'The size in Mb
L5 = FileLen("C:\Temp\A_5.pdf") / 1000000 'The size in Mb
L6 = FileLen("C:\Temp\A_6.pdf") / 1000000 'The size in Mb
Sum1 = L1
Sum2 = L1+L2
Sum3 = L1+L2+L3
Sum4 = L1+L2+L3+L4
Sum5 = L1+L2+L3+L4+L5
Sum6 = L1+L2+L3+L4+L5+L6

There is nothing I can do from that point.
Maybe I have to learn VBA Language from the basic In order to construct a string of the A.pdf files, up to 8MB
 
Upvote 0
Try this macro, which splits the specified PDF input file (C:\path\to\file.pdf) into multiple parts (named file_Part_nnn.pdf in the same folder as the input file) whose maximum file size is 2048 KB.

There are plenty of comments and Debug.Print statements to help you understand the code.
VBA Code:
Option Explicit

Const Q As String = """"

Public Sub PDFtk_Split_PDF_By_Size()

    Dim Wsh As Object 'WshShell
    Dim command As String
    Dim PDFinputFile As String, PDFfolder As String
    Dim maxFileSizeKB As Long
    Dim pageFile As String
    Dim page As Long
    Dim totalFileSizeKB As Single, thisFileSizeKB As Single
    Dim pageFiles As String
    Dim part As Long
    
    'PDF file to be split into multiple parts
    
    PDFinputFile = "C:\path\to\file.pdf"
    
    'Maximum size of each part in kilobytes
    
    maxFileSizeKB = 2048
    
    Set Wsh = CreateObject("WScript.Shell")  'New WshShell
    
    If Dir(PDFinputFile) <> vbNullString Then
    
        PDFfolder = Left(PDFinputFile, InStrRev(PDFinputFile, "\"))        

        'Run PDFtk burst command to create multiple _Page_nnn.pdf files, one for each page in the input PDF

        command = "cmd /c PDFtk " & Q & PDFinputFile & Q & " burst output " & Q & Replace(PDFinputFile, ".pdf", "_Page_%03d.pdf") & Q
        Debug.Print Time; command
        Wsh.Run command, 0, True
        
        'Loop through the _Page_nnn.pdf files in order and create _Part_nnn.pdf files whose size is less than the maximum file size.
        
        totalFileSizeKB = 0
        pageFiles = ""
        page = 0
        part = 0
        Do
            page = page + 1
            'Get the next _Page_nnn.pdf file
            pageFile = Dir(Replace(PDFinputFile, ".pdf", "_Page_" & Format(page, "000") & ".pdf"))
            If pageFile <> vbNullString Then
                thisFileSizeKB = FileLen(PDFfolder & pageFile) / 1024
                'Is this PDF page file size plus the current total file size less than the maximum file size?
                If totalFileSizeKB + thisFileSizeKB <= maxFileSizeKB Then
                    'Yes, so add this PDF page file to the string of files and increment the current total file size
                    pageFiles = pageFiles & Q & PDFfolder & pageFile & Q & " "
                    totalFileSizeKB = totalFileSizeKB + thisFileSizeKB
                Else
                    'No, so run PDFtk cat command to catenate the current PDF page files to the next PDF file named _Part_nnn.pdf
                    part = part + 1
                    command = "cmd /c PDFtk " & pageFiles & "cat output " & Q & Replace(PDFinputFile, ".pdf", "_Part_" & Format(part, "000") & ".pdf") & Q
                    Debug.Print Time; command
                    Wsh.Run command, 0, True
                    'Delete the current PDF page files
                    command = "cmd /c DEL " & pageFiles
                    Debug.Print Time; command
                    Wsh.Run command, 0, True
                    'Initialise the PDF page files with this PDF file and the total file size
                    pageFiles = Q & PDFfolder & pageFile & Q & " "
                    totalFileSizeKB = thisFileSizeKB
                End If
            End If
        Loop Until pageFile = vbNullString
        
        'If the current PDF page files isn't empty then run PDFtk cat command to catenate them to the next PDF file named _Part_nnn.pdf
        
        If pageFiles <> "" Then
            part = part + 1
            command = "cmd /c PDFtk " & pageFiles & "cat output " & Q & Replace(PDFinputFile, ".pdf", "_Part_" & Format(part, "000") & ".pdf") & Q
            Debug.Print Time; command
            Wsh.Run command, 0, True
            'Delete the current PDF page files
            command = "cmd /c DEL " & pageFiles
            Debug.Print Time; command
            Wsh.Run command, 0, True
        End If
        
        'Delete doc_data.txt file created by burst command
        
        If Dir(PDFfolder & "doc_data.txt") <> vbNullString Then Kill PDFfolder & "doc_data.txt"
        
        MsgBox "Done"
                    
    Else
    
        MsgBox "Error opening PDF file " & PDFinputFile
    
    End If
    
End Sub
 
Upvote 0
Try this macro, which splits the specified PDF input file (C:\path\to\file.pdf) into multiple parts (named file_Part_nnn.pdf in the same folder as the input file) whose maximum file size is 2048 KB.

There are plenty of comments and Debug.Print statements to help you understand the code.
VBA Code:
Option Explicit

Const Q As String = """"

Public Sub PDFtk_Split_PDF_By_Size()

    Dim Wsh As Object 'WshShell
    Dim command As String
    Dim PDFinputFile As String, PDFfolder As String
    Dim maxFileSizeKB As Long
    Dim pageFile As String
    Dim page As Long
    Dim totalFileSizeKB As Single, thisFileSizeKB As Single
    Dim pageFiles As String
    Dim part As Long
   
    'PDF file to be split into multiple parts
   
    PDFinputFile = "C:\path\to\file.pdf"
   
    'Maximum size of each part in kilobytes
   
    maxFileSizeKB = 2048
   
    Set Wsh = CreateObject("WScript.Shell")  'New WshShell
   
    If Dir(PDFinputFile) <> vbNullString Then
   
        PDFfolder = Left(PDFinputFile, InStrRev(PDFinputFile, "\"))       

        'Run PDFtk burst command to create multiple _Page_nnn.pdf files, one for each page in the input PDF

        command = "cmd /c PDFtk " & Q & PDFinputFile & Q & " burst output " & Q & Replace(PDFinputFile, ".pdf", "_Page_%03d.pdf") & Q
        Debug.Print Time; command
        Wsh.Run command, 0, True
       
        'Loop through the _Page_nnn.pdf files in order and create _Part_nnn.pdf files whose size is less than the maximum file size.
       
        totalFileSizeKB = 0
        pageFiles = ""
        page = 0
        part = 0
        Do
            page = page + 1
            'Get the next _Page_nnn.pdf file
            pageFile = Dir(Replace(PDFinputFile, ".pdf", "_Page_" & Format(page, "000") & ".pdf"))
            If pageFile <> vbNullString Then
                thisFileSizeKB = FileLen(PDFfolder & pageFile) / 1024
                'Is this PDF page file size plus the current total file size less than the maximum file size?
                If totalFileSizeKB + thisFileSizeKB <= maxFileSizeKB Then
                    'Yes, so add this PDF page file to the string of files and increment the current total file size
                    pageFiles = pageFiles & Q & PDFfolder & pageFile & Q & " "
                    totalFileSizeKB = totalFileSizeKB + thisFileSizeKB
                Else
                    'No, so run PDFtk cat command to catenate the current PDF page files to the next PDF file named _Part_nnn.pdf
                    part = part + 1
                    command = "cmd /c PDFtk " & pageFiles & "cat output " & Q & Replace(PDFinputFile, ".pdf", "_Part_" & Format(part, "000") & ".pdf") & Q
                    Debug.Print Time; command
                    Wsh.Run command, 0, True
                    'Delete the current PDF page files
                    command = "cmd /c DEL " & pageFiles
                    Debug.Print Time; command
                    Wsh.Run command, 0, True
                    'Initialise the PDF page files with this PDF file and the total file size
                    pageFiles = Q & PDFfolder & pageFile & Q & " "
                    totalFileSizeKB = thisFileSizeKB
                End If
            End If
        Loop Until pageFile = vbNullString
       
        'If the current PDF page files isn't empty then run PDFtk cat command to catenate them to the next PDF file named _Part_nnn.pdf
       
        If pageFiles <> "" Then
            part = part + 1
            command = "cmd /c PDFtk " & pageFiles & "cat output " & Q & Replace(PDFinputFile, ".pdf", "_Part_" & Format(part, "000") & ".pdf") & Q
            Debug.Print Time; command
            Wsh.Run command, 0, True
            'Delete the current PDF page files
            command = "cmd /c DEL " & pageFiles
            Debug.Print Time; command
            Wsh.Run command, 0, True
        End If
       
        'Delete doc_data.txt file created by burst command
       
        If Dir(PDFfolder & "doc_data.txt") <> vbNullString Then Kill PDFfolder & "doc_data.txt"
       
        MsgBox "Done"
                   
    Else
   
        MsgBox "Error opening PDF file " & PDFinputFile
   
    End If
   
End Sub
Hey Genius,

The code works perfect and It Is exactly what I need.
There Is an Issue regarding the Max file size.

As long as the max file is 2048 or 4096 or 6144 It works like a charm splitting the PDF file into parts and after that delete all the seperate files.
However, when I choose a max file of 7168 (meaning 7mb) there is a problem. It creates part 1 and part 3 and but "forgot" to create part 2. Moreover, It doesn't delete all the pages.
Please find the attached picture

1622566918047.png


I have tried also 8192 (8mb) and still the same problem. It create part 1, then It skip part 2, then create part 3.
 
Upvote 0
The Debug output in the Immediate window should show if it is skipping files. DOS commands have a maximum length of 8191 characters, so if the folder path is long this limit could be exceeded because the code repeats the folder path for every _Page_n.pdf file. If the limit is being exceeded the code can be adjusted to fix this.

What is the size of the input PDF and how many pages does it have?
 
Upvote 0

Forum statistics

Threads
1,215,013
Messages
6,122,694
Members
449,092
Latest member
snoom82

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top