Merging Two Or More PDFs Using Lowagie iText API

In this Java tutorial, I am using Lowagie iText API to merge two or more PDF documents into one. Most of my comments related to the program are within the code itself. Please note that you need to have Lowagie iText API downloaded to run this program and for some reason I was able to run it only in JDK1.4, though the official website of iText says that it can run on any version later than 1.4. (It could be a local problem at my environment, please try with higher jdk’s also if you are using one.)

package com.kushal.pdf;

/**
 * @Author Kushal Paudyal
 * www.sanjaal.com/java
 * Last Modified On 2009-10-07
 *
 * This Requires JDK Version 1.4
 * I was having problem running in some later versions
 */
import java.io.FileOutputStream;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;

import com.lowagie.text.Document;
import com.lowagie.text.DocumentException;
import com.lowagie.text.pdf.PRAcroForm;
import com.lowagie.text.pdf.PdfCopy;
import com.lowagie.text.pdf.PdfImportedPage;
import com.lowagie.text.pdf.PdfReader;
import com.lowagie.text.pdf.SimpleBookmark;

public class PDFMerger {

public static void main(String args[]) throws IOException,
DocumentException {
String fileOne = "C:/temp/myTimeInOut.pdf";
String fileTwo = "C:/temp/myGanttChart.pdf";
String mergedFileLocation = "C:/temp/myMergedData.pdf";

String filesTobeMerges[] = new String[] { fileOne, fileTwo };

mergeMyFiles(filesTobeMerges, mergedFileLocation);
}

/**
* Tool that can be used to concatenate
* any number of existing PDF files To One.
*/

public static void mergeMyFiles(String filesToBeMerged[],
String mergedFileLocation) {

System.out.println("Starting To Merge Files...");
System.out.println("Total Number Of Files To Be Merged..."+filesToBeMerged.length+"n");
try {
int pageOffset = 0;
ArrayList masterBookMarkList = new ArrayList();

int fileIndex = 0;
String outFile = mergedFileLocation;
Document document = null;
PdfCopy writer = null;
PdfReader reader = null;

for (fileIndex = 0; fileIndex < filesToBeMerged.length; fileIndex++) {

/**
* Create a reader for the file that we are reading
*/
reader = new PdfReader(filesToBeMerged[fileIndex]);
System.out.println("Reading File -"+filesToBeMerged[fileIndex]);

/**
* Replace all the local named links with the actual destinations.
*/
reader.consolidateNamedDestinations();

/**
* Retrieve the total number of pages for this document
*/
int totalPages = reader.getNumberOfPages();

/**
* Get the list of bookmarks for the current document
* If the bookmarks are not empty, store the bookmarks
* into a master list
*/
System.out.println("Checking for bookmarks...");
List bookmarks = SimpleBookmark.getBookmark(reader);
if (bookmarks != null) {
if (pageOffset != 0)
SimpleBookmark.shiftPageNumbers(bookmarks, pageOffset,
null);
masterBookMarkList.addAll(bookmarks);
System.out.println("Bookmarks found and storing...");
}else
{
System.out.println("No bookmarks in this file...");
}
pageOffset += totalPages;

/**
* Merging the files to the first file.
* If we are passing file1, file2 and file3,
* we will merge file2 and file3 to file1.
*/
if (fileIndex == 0) {
/**
* Create the document object from the reader
*/
document = new Document(reader.getPageSizeWithRotation(1));

/**
* Create a pdf write that listens to this document.
* Any changes to this document will be written the file
*
* outFile is a location where the final merged document
* will be written to.
*/

System.out.println("Creating an empty PDF...");
writer = new PdfCopy(document,
new FileOutputStream(outFile));
/**
* Open this document
*/
document.open();
}
/**
* Add the conent of the file into this document (writer).
* Loop through multiple Pages
*/
System.out.println("Merging File: "+filesToBeMerged[fileIndex]);
PdfImportedPage page;
for (int currentPage = 1; currentPage <= totalPages; currentPage++) {
page = writer.getImportedPage(reader, currentPage);
writer.addPage(page);
}

/**
* This will get the documents acroform.
* This will return null if no acroform is part of the document.
*
* Acroforms are PDFs that have been turned into fillable forms.
*/
System.out.println("Checking for Acroforms");
PRAcroForm form = reader.getAcroForm();
if (form != null)
{
writer.copyAcroForm(reader);
System.out.println("Acroforms found and copied");
}else
System.out.println("Acroforms not found for this file");

System.out.println();
}
/**
* After looping through all the files, add the master bookmarklist.
* If individual PDF documents had separate bookmarks, master bookmark
* list will contain a combination of all those bookmarks in the
* merged document.
*/
if (!masterBookMarkList.isEmpty())
{
writer.setOutlines(masterBookMarkList);
System.out.println("All bookmarks combined and added");

}else
{
System.out.println("No bookmarks to add in the new file");

}

/**
* Finally Close the main document, which will trigger the pdfcopy
* to write back to the filesystem.
*/
document.close();

System.out.println("File has been merged and written to-"+mergedFileLocation);
} catch (Exception e) {
e.printStackTrace();
}
}
}


The output of this program:

Starting To Merge Files...
Total Number Of Files To Be Merged...2

Reading File -C:/temp/myTimeInOut.pdf
Checking for bookmarks...
No bookmarks in this file...
Creating an empty PDF...
Merging File: C:/temp/myTimeInOut.pdf
Checking for Acroforms
Acroforms not found for this file

Reading File -C:/temp/myGanttChart.pdf
Checking for bookmarks...
No bookmarks in this file...
Merging File: C:/temp/myGanttChart.pdf
Checking for Acroforms
Acroforms not found for this file

No bookmarks to add in the new file
File has been merged and written to-C:/temp/myMergedData.pdf

About Lowagie iText:

iText is a library that allows you to generate PDF files on the fly.

iText is an ideal library for developers looking to enhance web- and other applications with dynamic PDF document generation and/or manipulation. iText is not an end-user tool. Typically you won’t use it on your Desktop as you would use Acrobat or any other PDF application. Rather, you’ll build iText into your own applications so that you can automate the PDF creation and manipulation process. For instance in one or more of the following situations:

  • Due to time or size, the PDF documents can’t be produced manually.
  • The content of the document must be calculated or based on user input.
  • The content needs to be customized or personalized.
  • The PDF content needs to be served in a web environment.
  • Documents are to be created in “batch process” mode.

You can use iText to:

  • Serve PDF to a browser
  • Generate dynamic documents from XML files or databases
  • Use PDF’s many interactive features
  • Add bookmarks, page numbers, watermarks, etc.
  • Split, concatenate, and manipulate PDF pages
  • Automate filling out of PDF forms
  • Add digital signatures to a PDF file
  • And much more…

In short: the iText classes are very useful for people who need to generate read-only, platform independent documents containing text, lists, tables and images; or who want to perform specific manipulations on existing PDF documents. The library is especially useful in combination with Java(TM) technology-based Servlets; there’s also a .NET port available: iTextSharp (written in C#).

iText requires JDK 1.4. It’s available for free under a multiple license: MPL and LGPL.

Splitting PDF File Using Java iText API Into Multiple PDFs

Previously I wrote a tutorial about how to merge two or more PDF files. This tutorial will do the opposite. I will present how to split a PDF with multiple pages into multiple PDFs using the Java iText API from Lowagie. You will be requiring the iText API to run this program and you can download it from www.lowagie.com/iText/

This program will take two parameters as input which are defined inside the main method.

  • First parameter is the full path of PDF file that needs to be split.
  • Second parameter is the number of pages the each split should have.

For example, if you have a PDF of 15 pages, you might want to split into 4 pages each. There will be total 4 splits. First three splits will have 4 pages each, while the last split will have 3 pages (4+4+4+3=15).

Just summarizing what can be learned from the following program:

  • 1. Using iText API to read PDF file
  • 2. How to find total number of pages in the PDF
  • 3. How to Use PdfCopy and PDFImportedPage features
  • 4. PDF Splitting Logic
  • 5. How to trigger the PDF file writing using PdfCopy

Fully Compiled and Tested Source Code:

package com.kushal.pdf;

/**
 * @Author Kushal Paudyal
 * www.sanjaal.com/java
 * Last Modified On: 2009-11-04
 *
 * PDFSplitter.java
 * Split any PDF file into multiple PDFs
 */
import java.io.FileOutputStream;

import com.lowagie.text.Document;
import com.lowagie.text.pdf.PdfCopy;
import com.lowagie.text.pdf.PdfImportedPage;
import com.lowagie.text.pdf.PdfReader;

public class PDFSplitter {

	public static void main(String[] args) {

		/**
		 * Location of input file which is to be splitted.
		 */
		String fileToSplit = "C:/temp/general/MyWebReport.pdf";

		/**
		 * Page Size of each splitted files
		 *
		 * e.g. 4 pages each in the split.
		 */
		int splittedPageSize = 4;

		/**Call the split method with filename and page size as params**/
		splitPDFFile(fileToSplit, splittedPageSize);

	}

	/**
	 * @param fileName : PDF file that has to be splitted
	 * @param splittedPageSize : Page size of each splitted files
	 */
	public static void splitPDFFile(String fileName, int splittedPageSize) {
		try {
			/**
			 * Read the input PDF file
			 */
			PdfReader reader = new PdfReader(fileName);
			System.out.println("Successfully read input file: " + fileName
					+ "n");
			int totalPages = reader.getNumberOfPages();
			System.out.println("There are total " + totalPages
					+ " pages in this input filen");
			int split = 0;

			/**
			 * Note: Page numbers start from 1 to n (not 0 to n-1)
			 */
			for (int pageNum = 1; pageNum <= totalPages; pageNum += splittedPageSize) {
				split++;
				String outFile = fileName
						.substring(0, fileName.indexOf(".pdf"))
						+ "-split-" + split + ".pdf";
				Document document = new Document(reader
						.getPageSizeWithRotation(1));
				PdfCopy writer = new PdfCopy(document, new FileOutputStream(
						outFile));
				document.open();
				/**
				 * Each split might contain one or more pages defined by splittedPageSize
				 *
				 * E.g. We are splitting a 15 pages pdf to 4 page each.
				 * In this example, the last split will have only 3 pages (4+4+4+3 =15)
				 *
				 * Note the following condition that handles the scenario where total
				 * number of pages in the splitted file is less that splittedpageSize
				 *
				 * It will always be the last split.
				 *
				 * splittedPageSize && (pageNum+offset) <=totalPages
				 */
				int tempPageCount = 0;
				for (int offset = 0; offset < splittedPageSize
						&& (pageNum + offset) <= totalPages; offset++) {
					PdfImportedPage page = writer.getImportedPage(reader,
							pageNum + offset);
					writer.addPage(page);
					tempPageCount++;
				}

				document.close();
				/**The following will trigger the PDF file being written to the system**/
				writer.close();

				System.out.println("Split: [" + tempPageCount + " page]: "
						+ outFile);

			}

		} catch (Exception e) {
			e.printStackTrace();
		}
	}
/*
	 * SANJAAL CORPS MAKES NO REPRESENTATIONS OR WARRANTIES ABOUT THE SUITABILITY OF 
	 * THE SOFTWARE, EITHER EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED 
	 * TO THE IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A 
	 * PARTICULAR PURPOSE, OR NON-INFRINGEMENT. SANJAAL CORPS SHALL NOT BE LIABLE FOR 
	 * ANY DAMAGES SUFFERED BY LICENSEE AS A RESULT OF USING, MODIFYING OR 
	 * DISTRIBUTING THIS SOFTWARE OR ITS DERIVATIVES. 
	 * 
	 * THIS SOFTWARE IS NOT DESIGNED OR INTENDED FOR USE OR RESALE AS ON-LINE 
	 * CONTROL EQUIPMENT IN HAZARDOUS ENVIRONMENTS REQUIRING FAIL-SAFE 
	 * PERFORMANCE, SUCH AS IN THE OPERATION OF NUCLEAR FACILITIES, AIRCRAFT 
	 * NAVIGATION OR COMMUNICATION SYSTEMS, AIR TRAFFIC CONTROL, DIRECT LIFE 
	 * SUPPORT MACHINES, OR WEAPONS SYSTEMS, IN WHICH THE FAILURE OF THE 
	 * SOFTWARE COULD LEAD DIRECTLY TO DEATH, PERSONAL INJURY, OR SEVERE 
	 * PHYSICAL OR ENVIRONMENTAL DAMAGE ("HIGH RISK ACTIVITIES"). SANJAAL CORPS 
	 * SPECIFICALLY DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY OF FITNESS FOR 
	 * HIGH RISK ACTIVITIES. 
	 */
}

Output of this program:

Successfully read input file: C:/temp/general/MyWebReport.pdf

There are total 15 pages in this input file

Split: [4 page]: C:/temp/general/MyWebReport-split-1.pdf
Split: [4 page]: C:/temp/general/MyWebReport-split-2.pdf
Split: [4 page]: C:/temp/general/MyWebReport-split-3.pdf
Split: [3 page]: C:/temp/general/MyWebReport-split-4.pdf

Original And Splitted PDFs In File Explorer: