Site icon Vinsguru

Introducing PDFUtil – Compare two PDF files textually or Visually

In my project, I need to compare tons of PDF files. I could not find any good FREE library which is working out of the box to compare the PDF files. I did not want just Text compare & I was looking for something which can compare PDFs pixel by pixel to find all the differences.  Libraries which can do are NOT FREE.

So, I have come up with a simple JAVA library (using apache-pdf-box – Licensed under the Apache License, Version 2.0) which can compare given PDF documents in Text/Image mode & highlight the differences, extract images from the PDF documents, save the PDF pages as images etc.

Udemy – Java 8 and Beyond for Testers:

TestAutomationGuru has released a brand new course in Udemy on Java 8 and Beyond for Testers. 13 hours course with java latest features, lambda, stream, functional style programming etc. Please access the above link which gives you the special discount.  You can also get your money back if you do not like the course within 30 days.

Maven Dependency:

Include the below dependency in your POM file.

Download:

PDF compare utility with all the dependencies.


	taguru-pdf-utility-v1.1.zip	(44476 downloads)


Github:

The source code for this project is here.

Usage:

import com.testautomationguru.utility.PDFUtil;

PDFUtil pdfUtil = new PDFUtil();
pdfUtil.getPageCount("c:/sample.pdf"); //returns the page count

//returns the pdf content - all pages
pdfUtil.getText("c:/sample.pdf"); 

// returns the pdf content from page number 2
pdfUtil.getText("c:/sample.pdf",2); 

// returns the pdf content from page number 5 to 8
pdfUtil.getText("c:/sample.pdf", 5, 8);

//set the path where we need to store the images
 pdfUtil.setImageDestinationPath("c:/imgpath");
 pdfUtil.extractImages("c:/sample.pdf");

// extracts and saves the pdf content from page number 3
pdfUtil.extractImages("c:/sample.pdf", 3);

// extracts and saves the pdf content from page 2
pdfUtil.extractImages("c:/sample.pdf", 2, 2);

//set the path where we need to store the images
 pdfUtil.setImageDestinationPath("c:/imgpath");
 pdfUtil.savePdfAsImage("c:/sample.pdf");

String file1="c:/files/doc1.pdf";
String file1="c:/files/doc2.pdf";

// compares the pdf documents and returns a boolean
// true if both files have same content. false otherwise.
pdfUtil.compare(file1, file2);

// compare the 3rd page alone
pdfUtil.compare(file1, file2, 3, 3);

// compare the pages from 1 to 5
pdfUtil.compare(file1, file2, 1, 5);

String file1="c:/files/doc1.pdf";
String file1="c:/files/doc2.pdf";

//pass all the possible texts to be removed before comparing
pdfutil.excludeText("1998", "testautomation");

//pass regex patterns to be removed before comparing
// \\d+ removes all the numbers in the pdf before comparing
pdfutil.excludeText("\\d+");

// compares the pdf documents and returns a boolean
// true if both files have same content. false otherwise.
pdfUtil.compare(file1, file2);

// compare the 3rd page alone
pdfUtil.compare(file1, file2, 3, 3);

// compare the pages from 1 to 5
pdfUtil.compare(file1, file2, 1, 5);

String file1="c:/files/doc1.pdf";
String file1="c:/files/doc2.pdf";

// compares the pdf documents and returns a boolean
// true if both files have same content. false otherwise.
// Default is CompareMode.TEXT_MODE
pdfUtil.setCompareMode(CompareMode.VISUAL_MODE);
pdfUtil.compare(file1, file2);

// compare the 3rd page alone
pdfUtil.compare(file1, file2, 3, 3);

// compare the pages from 1 to 5
pdfUtil.compare(file1, file2, 1, 5);

//if you need to store the result
pdfUtil.highlightPdfDifference(true);
pdfUtil.setImageDestinationPath("c:/imgpath");
pdfUtil.compare(file1, file2);

For example, I have 2 PDF documents which have exact same content except the below differences in the charts.

 

                                      

 

 

My PDFUtility gives the result as given below (highlights the difference in Magenta color by default. Color can be changed).

 

Features to be added soon:

 

Share This:

Exit mobile version