In my project, I need to compare tons of PDF files. I could not find any good FREE library which is working out of the box to compare the PDF files. I did not want just Text compare & I was looking for something which can compare PDFs pixel by pixel to find all the differences. Libraries which can do are NOT FREE.
So, I have come up with a simple JAVA library (using apache-pdf-box – Licensed under the Apache License, Version 2.0) which can compare given PDF documents in Text/Image mode & highlight the differences, extract images from the PDF documents, save the PDF pages as images etc.
Udemy – Java 8 and Beyond for Testers:
TestAutomationGuru has released a brand new course in Udemy on Java 8 and Beyond for Testers. 13 hours course with java latest features, lambda, stream, functional style programming etc. Please access the above link which gives you the special discount. You can also get your money back if you do not like the course within 30 days.
Maven Dependency:
Include the below dependency in your POM file.
Download:
PDF compare utility with all the dependencies.
taguru-pdf-utility-v1.1.zip (44473 downloads)
Github:
The source code for this project is here.
Usage:
- To get page count
import com.testautomationguru.utility.PDFUtil;
PDFUtil pdfUtil = new PDFUtil();
pdfUtil.getPageCount("c:/sample.pdf"); //returns the page count
- To get page content as plain text
//returns the pdf content - all pages
pdfUtil.getText("c:/sample.pdf");
// returns the pdf content from page number 2
pdfUtil.getText("c:/sample.pdf",2);
// returns the pdf content from page number 5 to 8
pdfUtil.getText("c:/sample.pdf", 5, 8);
- To extract attached images from PDF
//set the path where we need to store the images
pdfUtil.setImageDestinationPath("c:/imgpath");
pdfUtil.extractImages("c:/sample.pdf");
// extracts and saves the pdf content from page number 3
pdfUtil.extractImages("c:/sample.pdf", 3);
// extracts and saves the pdf content from page 2
pdfUtil.extractImages("c:/sample.pdf", 2, 2);
- To store PDF pages as images
//set the path where we need to store the images
pdfUtil.setImageDestinationPath("c:/imgpath");
pdfUtil.savePdfAsImage("c:/sample.pdf");
- To compare PDF files in text mode (faster – But it does not compare the format, images etc in the PDF)
String file1="c:/files/doc1.pdf";
String file1="c:/files/doc2.pdf";
// compares the pdf documents and returns a boolean
// true if both files have same content. false otherwise.
pdfUtil.compare(file1, file2);
// compare the 3rd page alone
pdfUtil.compare(file1, file2, 3, 3);
// compare the pages from 1 to 5
pdfUtil.compare(file1, file2, 1, 5);
- To exclude certain text while comparing PDF files in text mode
String file1="c:/files/doc1.pdf";
String file1="c:/files/doc2.pdf";
//pass all the possible texts to be removed before comparing
pdfutil.excludeText("1998", "testautomation");
//pass regex patterns to be removed before comparing
// \\d+ removes all the numbers in the pdf before comparing
pdfutil.excludeText("\\d+");
// compares the pdf documents and returns a boolean
// true if both files have same content. false otherwise.
pdfUtil.compare(file1, file2);
// compare the 3rd page alone
pdfUtil.compare(file1, file2, 3, 3);
// compare the pages from 1 to 5
pdfUtil.compare(file1, file2, 1, 5);
- To compare PDF files in Visual mode (slower – compares PDF documents pixel by pixel – highlights pdf difference & store the result as image)
String file1="c:/files/doc1.pdf";
String file1="c:/files/doc2.pdf";
// compares the pdf documents and returns a boolean
// true if both files have same content. false otherwise.
// Default is CompareMode.TEXT_MODE
pdfUtil.setCompareMode(CompareMode.VISUAL_MODE);
pdfUtil.compare(file1, file2);
// compare the 3rd page alone
pdfUtil.compare(file1, file2, 3, 3);
// compare the pages from 1 to 5
pdfUtil.compare(file1, file2, 1, 5);
//if you need to store the result
pdfUtil.highlightPdfDifference(true);
pdfUtil.setImageDestinationPath("c:/imgpath");
pdfUtil.compare(file1, file2);
For example, I have 2 PDF documents which have exact same content except the below differences in the charts.
My PDFUtility gives the result as given below (highlights the difference in Magenta color by default. Color can be changed).
Features to be added soon:
- While comparing PDFs in VISUAL_MODE, ignore certain area.
- While comparing PDFs in VISUAL_MODE, return true / false based on certain threshold / sensitivity.
Your library is very promising. It is open source? Can I find it on github?
It is not on github – I do not have any issues in sharing with others. Please give me sometime. I will share with you ASAP.
Hi, there’s been many requests for the source code to be shared. This is another +1
Hope you can get to it sometime. While on the subject, I think it would be nice if you shared/released the source code of future tools/utilities that you offer the binary for download (if/where you have no reservations or restrictions for sharing the source). It’s a lot easier to do when you make that an intent from the beginning.
And the lamest but still good approach would be to just tar/zip up the source code (with ideally OSS license) and offer that for download in addition to the binary, if you don’t want to deal with git/source control.
Hi,
I’m also interested by your library, how may i get the source.
Thx,
Thanks for such a wonderful explanation.
Can you please cahre the libraby with me on email id.
Thanks
Sachin A
India
One more question.
Can we compare text style like (Bold,Italic,Size,Type etc)
Unfortunately text compare does not check the styles 🙁 .We need to do the visual compare for that.
Can you push it on git. it has quite a lot of potential and i would like to contribute to your code. Thanks – Abhishek
Can you mail me the documentation of the code for easy understanding.
I also have the similar project. And I find your work as brilliant. It will be very useful if u share how the compare works and how result is shown? Thanks in advance
I am using eclipse to run your code. The compare block throws error as
“Nov 02, 2015 6:01:39 PM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: BDC
Nov 02, 2015 6:01:41 PM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: EMC”
Where and how should I run this code to get the highlighted differences?
Also, Is it possible for me to have this pdf comparer as a web service?
The message is coming from the dependent pdfbox library. It is an INFO. You can ignore this.
Check this.
https://mail-archives.apache.org/mod_mbox/pdfbox-users/201304.mbox/%3C81EE7009-C3C0-4B70-B4DF-13501B23D814@fileaffairs.de%3E
I am currently working on a pdf comparer project. We are working on highlighting the differences between two pdfs.Above code helped us to compare pdfs. But the output is highlighted and overlapping..Can you please send us your source code that will help us to make changes on the above result.
i m also facing same problem as the output file is highlighted and overlapping
Yes, it is expected as it compares pixel by pixel. so for a very small change, it will highlight – so you might see as it overlaps. If you expect text mismatch, please do text compare.
Very nice util !
…one question…..
sometimes it’s nice to have a method which enables you to exclude some part of the PDF file … by making use of page area’s which one can select or deselect….
If you let me access the source I can make some extentsions for all of us……
anyway… nice job !
very nice.. i want to display the changed file content on console as a string, how to do it ???
You can get the content of the PDF as text. Then you can apply the logic yourself to find the mismatch. That should be very easy to implement.
Hi
we are using this, first we should thank for such a great work you provided to us. Thank you very much!
My two PDF documents have 16 differences, but comparePdfFilesBinaryMode(file1, file2); method is showing only 13 differences(screenshots). How should we overcome this problem?
Any Suggestions? I am looking for Optical character recognising (ocr)jar files to overcome this.
Would you share the PDF files with me please?
Updated Source code corrected this, thank you
Hi, vlns, do you have source codes posted on, e.g., github? I would like to use and contribute your project too.
Hi,
I am trying to compare two files and get following error:
Feb 05, 2016 12:59:10 PM org.apache.pdfbox.util.operator.pagedrawer.Invoke process
WARNING: getRGBImage returned NULL
Feb 05, 2016 12:59:10 PM org.apache.pdfbox.pdmodel.graphics.xobject.PDPixelMap getRGBImage
SEVERE: java.lang.NegativeArraySizeException
java.lang.NegativeArraySizeException
Looks like it is problem with PDFBOX.jar
What I can see you are using ver 1.8.9
but there is version 2.0 RC
Can you provide your tool with updated PDFBOX to check if this will fix my problem?
Thanks in advance
Yes, That is right.
pdfbox.jar is a separate jar in the PDFUtil. You can just replace the pdfbox.jar with the latest one. Thanks for pointing it out.
Hi,
thanks for info,
I updated pdfbox to ver 20.0.-rc3
I get following error:
Exception in thread “main” java.lang.NoSuchMethodError: org.apache.pdfbox.pdmodel.PDDocument.load(Ljava/lang/String;)Lorg/apache/pdfbox/pdmodel/PDDocument;
at com.taguru.utility.PDFUtil.getPageCount(PDFUtil.java:160)
I am using IntelliJ idea 15 community edition.
Do you know how to fix this?
Somehow this comment went to spam. Not sure why.
Anyway for your question – Can you please see if you can use pdfbox-app-1.8.11.jar.?
Hi great work. i am very much interested . As per our project needs, we need to skip som of the sections in the PDF from comparing. it would be helpful if you share open jar file with us. Thank you
Hi Can you please share your project/source?
Interested in your api. Looking forward for a PDF comparison requirement., Would be grateful if you can share this source/jar file to try
SJS
Hi,
I also want to contribute to this code. Please share github link or src of this.
Thanks,
Hi Vls,
Congratulations to have created such a nice tool.
Will you share the code? And if you are not supposed to share the code can you at least tell us you intention?
It looks like you are not answering to every request about sharing the code so it is unclear whether you will actually do it.
Thanks
Simone
Source code is here- https://github.com/vinsguru/pdf-util
Wow, nice work ! This really saves me a lot of work, manually comparing hundreds of pdfs
Currently it does not seem to run under Java 8 . :-/ Is there an upgrade planned ?
Hi. Could you share source code with me?
Source code is here- https://github.com/vinsguru/pdf-util
i m getting date format exception
I have used the functions and plugin, but we are not able to save the image as said in the last section. i.e, comparing two pdf files and highlighting the differences and writing it to in an image file. Could you please help in the regard. Piece of code is something like below.
pdfutil.highlightPdfDifference(true);
pdfutil.setImageDestinationPath(Path+”//results//”);
//pdfutil.savePdfAsImage(Path);
System.out.println(pdfutil.comparePdfFilesTextMode(Doc_BaseLine, Doc_Actual));
// pdfutil.comparePdfFilesBinaryMode(Doc_BaseLine, Doc_Actual);
System.out.println(pdfutil.comparePdfFilesBinaryMode(Doc_BaseLine, Doc_Actual));
//pdfutil.extractImages(Doc_Actual) ;
pdfutil.savePdfAsImage(Doc_Actual);
Can you enable log and share that please?
Hi,
First of all, Thank you. It helped me a lot. But as per my project, i need to skip some of the sections in the PDF from comparing. it would be helpful if you share open jar file with us so that i can make changes as per need.
Thank you
It can be done.
Source code is here- https://github.com/vinsguru/pdf-util
Hi,
not able to generate image file if two pdf’s are totally different.
Can you share those PDF files?
Hi, rather saving the compared image to specific path I want to download the compared PDF output image file , is it possible ?? if yes plz suggest me solution for it .. Thanks
Hi,
I’ve posted multiple comments here but none are actually showing up. I would really like to use this could you please help me?
1. After downloading the ZIP file (which contains 2 JARs), what are the exact steps to compare 2 PDFs?
2. Could you send me a link to the source code as well?
Thanks!
I was not even to able to login to my blog due to some issues related to wordpress blog recent update! So could not answer your question.
Check this link to include the downloaded jar files in eclipse.
Once added, you should be able to use below code
import com.taguru.utility.PDFUtil;
PDFUtil pdfUtil = new PDFUtil();
pdfUtil.getPageCount("c:/sample.pdf"); //returns the page count
Thanks for your reply 🙂 I’ll try that out. Also, are you willing to share the source code?
Hello,
Could you please share the source code with me?
Source code is here- https://github.com/vinsguru/pdf-util
Wonderful information and Amazing explanation !!! 🙂
I have downloaded the Zip file and when I am trying to extract the Zip unfortunately it is showing an error as “Cannot Open file: it does not appear to be a valid archive”
could you please resend that valid Zip file to my email id
Thanks in Advance !!!
Please try downloading again, it works fine.
Thank you so much for sharing this tool! Would you be able to please share the source code? I need to modify it to ignore certain parts of the file and remove special unicode certain characters from the PDF file before comparing as it’s throwing off the comparison.
Please let me know when you share it. Thanks.
Source code is here- https://github.com/vinsguru/pdf-util
Hello,
When will the code be available for this? Great work 🙂
Sorry for the delay 🙁
Source code is here- https://github.com/vinsguru/pdf-util
Hi
The library is promising. Can you share the source code with us? And can you point us to some documentation? Say I want to change the colour of comparison from Magenta to Green, how do I do that?
Source code is here- https://github.com/vinsguru/pdf-util
Hi VLNS,
I’ve messaged multiple times asking if the source code is available for this. Kindly let me know if it’s not so I can start my own implementation 🙂 Just don’t want to waste time implementing something from scratch if I can just build on yours so please let me know soon.
Thanks.
Sorry Mr. George for the delay.
Source code is here- https://github.com/vinsguru/pdf-util
Nice Tool!
Is there a way to declare wildcards in binaryMode? I generate daily pdf reports with the current date on it. The textMode is not accurate enough for my pdf files.
It would be nice if you can declare wildcards (a region on the file may).
Hi
Can you share source code please?
Source code is here- https://github.com/vinsguru/pdf-util
Neat utility which great potential.
I think it would be really good if ignore rules could be added based on some RegEx
Neat utility with great potential.
I think it would be really good if there was a way to ignore certain text by adding some Regex rules
Yes, I too think about that. Will work on that.
It’s great work and indeed. don’t mind can you share the source code so that we can contribute to utlize in all the possible requirements?
Source code is here- https://github.com/vinsguru/pdf-util
Hi, I tried to compare pixel by pixel for 2 PDFs using below code. But am not getting the image, which highlights the difference between the 2 PDFs
//if you need to store the result
pdfUtil.highlightPdfDifference(true);
pdfUtil.setImageDestinationPath(“c:/imgpath”);
pdfUtil.comparePdfFilesBinaryMode(file1, file2);
Check the usage.
Source code is here- https://github.com/vinsguru/pdf-util
I would like to call the JAR file in VB Script or from UFT 12.53.
Can you please guide me. Sample VB script i have attached . But unable to pass the command ( getPageCount) & arguments (“c:/sample.pdf”)
Set WshShell = CreateObject(“WScript.Shell”)
dim a
a = “C:\PDF Compare\taguru-pdf-utility-v1.0\taguru-pdf-utility-v1.0\pdfbox-app-1.8.9.jar”
WshShell.Run “java -jar ” & chr(34) & a & chr(34)
possible usage would be java -jar pdfutil.jar file1.pdf file2.pdf
Source code is here- https://github.com/vinsguru/pdf-util
Hi,
Function convertToImageAndCompare(String file1, String file2, int startPage, int endPage) having issues, not returning anything and also unable to generate Results to a folder with the following :
String file1 = “resources/July 16th.pdf”;
String file2 = “resources/July 17th.pdf”;
util.highlightPdfDifference(true);
util.setImageDestinationPath(“/Users/test/Errors”);
util.comparePdfFilesBinaryMode(file1, file2);
Do we need to give any file name? tried to debug the code but it only returns only true or false, the code under the convertToImageAndCompare is commented and the function comparePdfFilesBinaryMode is calling convertToImageAndCompare which is not returning anything and getting the error.
Thanks,
Jeevan
Please use the latest version – Source code is here- https://github.com/vinsguru/pdf-util
Trying with pdfbox-app-2.0.2and get the following error:
Exception in thread “main” java.lang.NoSuchMethodError: org.apache.pdfbox.pdmodel.PDDocument.load(Ljava/lang/String;)Lorg/apache/pdfbox/pdmodel/PDDocument;
at com.taguru.utility.PDFUtil.getPageCount(PDFUtil.java:160)
at com.taguru.utility.PDFUtil.comparePdfByImage(PDFUtil.java:459)
at com.taguru.utility.PDFUtil.comparePdfFilesBinaryMode(PDFUtil.java:402)
at ERS.UnitTest.Reports.ComparePDFDocuments.Compare2Documents(ComparePDFDocuments.java:27)
at ERS.UnitTest.Reports.ComparePDFDocuments.main(ComparePDFDocuments.java:19)
Please advise.
Please use the latest version. Source code is here- https://github.com/vinsguru/pdf-util
This is really very nice.Please share the source code to my mail id.
Source code is here- https://github.com/vinsguru/pdf-util
Nice!
Looks like this could be what I am looking for, could you please share the source code?
Source code is here- https://github.com/vinsguru/pdf-util
Can you please share the source code or post it in GitHub. I want to contribute in the project too. I think there have been many requests regarding this.
Source code is here- https://github.com/vinsguru/pdf-util
Hello,
I also would like to ask you about your tool.
Could you please share source code or give me a link?
Thanks in advance!
Source code is here- https://github.com/vinsguru/pdf-util
Hi! How about the sources? I would really line to submit some enhancements and perhaps look into the regex-excludes mentioned above.
Its a Nice utility !! I just tried using it. I have used the method ‘comparePdfFilesBinaryMode’. it compared the pdfs but result image is generated only for first page of pdf though there are differences in second page too.
Please suggest if there is any other way to generate images for each page ?
Yes, Please check the API – you need to set the flag to compare all the pages. otherwise, it will just return false as soon as it finds a mismatch and exit.
while comparing the pdfs, I wanted to ignore few differences (like form IDs) in the PDFs and make them pass irrespective of few kinds of differences in them.
Can you please share source code in order to make this change for my project.
The PDFUtil was created by me and was poorly designed 🙁 ..this is the reason i am delaying to post in github. I will work on those and upload it in github very soon.
Source code is here- https://github.com/vinsguru/pdf-util
If the pdf are opened in URL . How do we compare those.?
Hm..interesting! Well, the quick solution would be to download the pdf & compare.
The example will be helpful for you to download.
http://stackoverflow.com/questions/20265740/how-to-download-a-pdf-from-a-given-url-in-java
Hi,
Is it possible to share a demo video on how to use this library file to compare PDF files in Visual mode. Or Steps to do this?
Thanks.
Please check the sample code. You can figure out yourself.
Hi,
I am running automation to compare more than one pair of PDF. i would like to save all the compared images into a output folder. But, your code seems to clean the folder before writing to it.
Yeah!! I thought I should clear the folder. But you are right. This library should not do that. It is upto the user to decide to clear or not. One easy option is, under the output folder, you can create a separate folder for each pdf. Or the sourcode is available in github. You can comment the code which clears the output folder & build it. I have provided the build instruction.
Is there a way to compare only part of the PDF, like excluding the date, which is usually present in the header.
No, for the time being! But you can do this yourself.
pdfUtil.getText("c:/sample.pdf").replaceAll("[0-9]{2}\\[0-9]{2}\\[0-9]{4}", "")
it will remove the date and give you the string for compare.This is working fine for text compare. But in case of image comparison it is failing, is there a way out to either remove this from PDF ?
In case of image, it does pixel by pixel compare. It is very sensitive comparison & masking certain is not very simple and straightforward approach.
Thanks for the response, will try this.
Can you please let us know how to compare a pdf when it has a watermark or watermark layer on it. Also can you please let us know how to delete that water mark .This utility helped us greatly.
Hm..well that is going to tough! It can be done. You could use text mode for the time being.
Thanks for your quick response ,really appreciated and thanks for your good work.
Hi , I am trying to use this utility in vbscript , my requirement is compare two pdf files in commandline and generate the difference file in specific location .. using Jar i am unable to set the image destination path in commandline ..please guide
1.set destination image path in commandline
2.compare the images in commandline and save the difference image
I just updated the source code. Please pull from github and build the jar file yourself.
Find the instruction here – https://github.com/vinsguru/pdf-util
We had the same requirement, I modified the main class and re-built the jar using the maven build file. We found that the comparison needed to be page by page or else you don’t get a diff image per page, also we needed a non-zero System exit value to get it to be useful in the test environment.
Ok, nice. Glad that you find it useful.
Exception in thread “main” java.lang.UnsupportedClassVersionError: com/testautomationguru/utility/PDFUtil : Unsupported major.minor version 52.0
Facing this issuewhile running java program. Could you please help me with this how to resolve it?
You might need JRE 8
Hi I am finding issue when both of the images are having difference , then the resulting image is not highlighting the difference . Also this would be nice if you can make a side by side comparison
Number of pixels change could be very less – may be 1 – that is why you are unable to notice the difference. I will see if we can have some threshold.
Hi, If possible can you please help me with Watermark removal code ,it is really important for me and that will be of great help to me if you can do that .
hi, i want to compare pdf files pixel by pixel but this is not comparing can you show to me how to execute this code
What is the error you get? This article already explains how to do!
This is really very nice.
And i would like to know that whether the below features which are added soon is available in github.
While comparing PDFs – ignore certain text using Regular Expression
For example, 2 PDFs have same text & contains date on which it was generated which needs to be omitted while comparing.
While comparing PDFs in VISUAL_MODE, ignore certain area.
While comparing PDFs in VISUAL_MODE, return true / false based on certain threshold / sensitivity.
I too want to know the same. Karuna, did you get any help on this?
adding comment to follow -up
Thank you so much vlns… I am new to Selenium and do not understand Git, anyway I was able to download the jar file. Just wanted to know what import command do I need to write in my eclipse after adding this jar to my Reference Library
This example will help you – https://github.com/vinsguru/pdf-util/blob/master/src/test/java/com/testautomationguru/utility/PDFUtilTest.java
Thank You vlns. This helped me. But my PDF has some 4-5 lines in the bottom of each page that contain image having dynamic content. Is there a way I can crop them out (remove/Ignore them) before comparison in Visual Mode. Please help.
Hi Vlns
your work is marvelous, But Pixel by pixel comparison is much slower when compared to a Licensed tool(StreamDiff). Do you have any idea to increase the speed of comparison?
Will Wait for your response!
Hi Vlns,
This works wonders…Thank you so much.
But for pixel by pixel comparison, my PDF have 3 pages, and there were some differences on all 3 pages but the Result Image that captures the Difference only shows the same for 1st page only. Can you please help on this – as to how to showcase the differences on all the pages of PDF and not just the 1st Page.
That is the default behavior to exit as soon as a mismatch is found. if you want all pages to be compared, you could set – pdfUtil.compareAllPages(true) – before comparing.
Thank You Vlns. This worked. But in case the no. of pages in the PDF are not same, then this does not spot the difference in the image format. Is there any way I can capture the Pixel difference in all the pages even if page count does not match?
Hi Team,
I admire your work very much. I want to bring to your notice that image generated by below statement
pdfRenderer1.renderImageWithDPI(iPage, 72, ImageType.RGB
is https://drive.google.com/file/d/0B18WGCjoaDzJQXVnYVdDand3cWs/view
which is odd, and time it takes to compare singe page of two pdfs is 5 secs on an average.
Could you please suggest any resolution to correct image generation and increase speed of comparison?
Waiting for your response( positive or diplomatic, ready to receive)
Hi Team, any reply is fine with me. Could you help?
HI
I am comparing two PDFs and i have enabled the logs too.
ArrayIndexOutOfBound is coming:
WARNING: The end of the stream doesn’t point to the correct offset, using workaround to read the stream, stream start position: 5903, length: 0, expected end position: 5903
Apr 02, 2017 9:20:44 AM com.testautomationguru.utility.PDFUtil convertToImageAndCompare
INFO: Comparing Page No : 1
Apr 02, 2017 9:20:44 AM org.apache.pdfbox.contentstream.PDFStreamEngine operatorException
WARNING: Image stream is empty
java.lang.ArrayIndexOutOfBoundsException: Coordinate out of bounds!
at sun.awt.image.IntegerInterleavedRaster.getDataElements(IntegerInterleavedRaster.java:219)
at java.awt.image.BufferedImage.getRGB(BufferedImage.java:986)
at com.testautomationguru.utility.ImageUtil.compareAndHighlight(ImageUtil.java:19)
at com.testautomationguru.utility.PDFUtil.convertToImageAndCompare(PDFUtil.java:458)
Need your email id so that can share the PDFs
can you send the pdf files to vino.infy@gmail.com?
This is an awesome utility. I have one doubt, we want to change the color of the difference and I don’t want to overlap the difference, instead want it to shift towards left side. Will that be possible?
You can change the color.
pdfutil.highlightPdfDifference(Color.BLACK);
And what do you think on shifting the differences to the left of the source pdf file. Actually we have two pdf files printing prices of the resources, And we want to compare the differences. But due to overlapping we can’t read baseline and actual file.
Hi Vlns,
This utility is really helpful, but i am facing one issue actually i used this utility as a jar and used in my class and passing “pdfUtil.setImageDestinationPath(DestPath);” DestPath – i am passing as string with two PDFs, — “public static String pdfMatchMethod(String pdf1Path, String pdf2Path, String DestPath ) “and converted my class as webservice, but while calling my class in Client proxy class and passing these parameters, image is not getting saved in the desired given path.
Can you please help me in solving this issue, why it is not downloading from webservice to local.
Hi, Can you please enable the log and see if you get any exceptions.
can you please tell me how do i compare the font and alignment of pdf’s with this library and with this library its not storing the differed image
pdfUtil.setImageDestinationPath(“c:/imgpath”);
pdfUtil.compare(file1, file2);
no exception but no image in the folder as well
Please enable the log & see if you get any exception. Ensure that you use the image mode comparison.
Hi Vlns,
This utility is really helpful.When we tried to use the jar and run it But no output was getting printed .Could you please help
Please share more details and ensure that you use the API correctly as per the examples provided.
Pretty impressive. Congrats.
Hi vlns,
If the two pdf been compared are completely different then no image is generated, can you please share some info for this behavior.
Yes, the very first check to do pdf compare is to match the number of pages in both pdfs. if they do not match, then it immediately fails.
Hi, the output image is not getting generated for me. Could you please help ?
Can you enable the log in the pdfutil and check what is happening?
Hi, I want to change the color of the highlighted difference in the result image, Where should I change also the comparison seems to be overlapping, So where to change to give gap
pdfutil lib has a method to change the color of the highlighted diff
Hi,
I have 2 files ,File A – 5 pages and File B- 10 pages . Is it possible to compare first 5 pages in FIle A and File B ? because i’m getting error as “files page counts do not match – returning false” when i try to compare these files.
Not currently. It exits as soon as number of pages in the PDFs are different. We can modify.
Hi Team,
// compare the pages from 1 to 5
pdfUtil.compare(file1, file2, 1, 5);
The above comparison doesn’t work for me .It always compares the first page in file1 and file2 and skips the rest of the pages .Please help on this
it’s a very useful tool, thanks for sharing.
may i know when this tool could support ignore some certain area/content that not compare with visual comparison mode? thanks a lot in advance.
Awesome!!
Hi ,
Could you please provide me the code to compare two pdf files and print the mismatches please in a seperate image file,
The code is available in github with examples. please check there.
as i mentioned I tried the code but was getting error mentioned in previous email.
java.io.IOException;error end of file ,expected line
Hi Vins,
This utility is really helpful, thanks for sharing it 🙂
Quick clarification
Below one is not working and It always compares the first page in both the files and skips the rest of the pages .Please help me on this
pdfUtil.compare(file1, file2, 1, 3);
As it is because, it already found a mismatch. so there is no point in proceeding further. PDF compare is little bit time consuming. So by exiting early could save us sometime. Before comparing you could use
pdfUtil.compareAllPages(true);
would compare all pagesHi ,
I am getting version 52 error when i run this,Please suggest.
Thanks vIns, it’s comparing all the pages now..
No image saved if there is no change in that particular page..
Say for example I have PDF with 2 pages.. there is some difference in 1st page and no change in 2nd page.. in this scenario it saves image only for 1st page since no change in the 2nd page..but I want image to be saved for 2nd page as well.. could you please help me on this
Hi,
Can anyone help me why I am getting the below error.
xception in thread “main” java.lang.UnsupportedClassVersionError: sikuli/nm/gh : Unsupported major.minor version 52.0
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:791)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:472)
Picked up JAVA_TOOL_OPTIONS: -agentlib:jvmhook
Picked up _JAVA_OPTIONS: -Xrunjvmhook -Xbootclasspath/a:”C:\Program Files (x86)\HP\Unified Functional Testing\bin\java_shared\classes”;”C:\Program Files (x86)\HP\Unified Functional Testing\bin\java_shared\classes\jasmine.jar”
code used is PDFUtil pdfUtil = new PDFUtil();
//pdfUtil.setCompareMode(CompareMode.VISUAL_MODE);
String file1="C://Program Files//Java//eclipse//capture.pdf";
String file2="C://Program Files//Java//eclipse//capture1.pdf";
// compares the pdf documents & returns a boolean
// true if both files have same content. false otherwise.
//pdfUtil.compare(file1, file2);
// compare the 3rd page alone
((PDFUtil) pdfUtil).compare(file1, file2,1,1);
vIns, Its very useful.. thanks..
But no exception generated and no image stored in the folder for few pdfs
I didnt get any logs in console.
could you please help me in enabling the logs in pdfutil
pdfUtil.enableLog() should give some information on what is going on.
Does it support editable PDF files (type pdf/x)?
Best regards
It is just a compare util mostly. It does not support typing at this moment
Hi,
I tried to compare two images using below code
//if you need to store the result
pdfUtil.highlightPdfDifference(true);
pdfUtil.setImageDestinationPath(“c:/imgpath”);
boolean results= pdfUtil.compare(file1, file2);
System.out.print(results);
pdfUtil.enableLog();
running the code in intellj.. am getting below message
Jul 18, 2018 11:59:37 PM org.apache.pdfbox.pdmodel.font.PDCIDFontType2
INFO: OpenType Layout tables used in font CIDFont+F1 are not implemented in PDFBox and will be ignored
false
Process finished with exit code 0
but I cant see the output in my c drive. please help and advise
use the enablelog before the compare and check the log.
Very helpfull and easy to use library.
I want to vote for the feature “ignore certain areas in visual mode”.
Then it would be perfect for us.
Very useful to my testing,Thank you so much.Well Done your work.
Hi, this is a very good library, I tried to run it from mac and set imagedestination path as : /Users/mymac/Documents/imagepath, but image result is not generated and also no error in console, am i missing anything here ?
My code:
PDFUtil pdfUtil = new PDFUtil();
pdfUtil.setCompareMode(CompareMode.VISUAL_MODE);
pdfUtil.highlightPdfDifference(true);
pdfUtil.setImageDestinationPath(“/Users/mymac/Documents/imgpath”);
pdfUtil.compare(file1, file2);an
Do you have permissions to write the result in the directory? Also you can enable log to see whats going on! pdfUtil.enableLog(level)
Not OP. Enabling the logging [pdfUtil.enableLog();] helped me with this issue.
Showed that the 2 PDFs were not the same page length which totally prevent any image being produced.
Is there a way to get an image of every page difference or does it stop at the first difference?
Actually it stops as soon as pdf page length is not same!
This is really a great utility. Appreciate your work. Thank you for this.
Will this library helps comparing if the page count of expected and actual is not matching?
Not as of now. But it can provide the content for the given pages. So you can do the comparison yourself by getting the text.
Thank you so much for this utility. It helped me a lot. However my requirement wants the functionality where I need to ignore few parts of PDFs while making the comparison. I have used your projects and made further changes to accommodate my requirement. Appreciate your efforts. Thanks again.
Really this tool is good. I expect one suggestion.
How to print result pdf’s without actual contents. I expect tool should highlight both actual and expected file changes individually and should not write any contents from actual to expected or expected to actual.
is this tool still working an can be used because I saw on Maven site that last time changes had been made about 2 years ago? Please, let me know
Thanks
Jeff
Hello Sir, Yes. it should still be working. It is a simple utility with basic functionalities.
Hi Vins,
Good to know. Now, how can I know if possible what are the differences in terms of text sections? This is when I choose just text comparison? Is it a way to define a threshold in terms of percentage of differences for test passes? mean like it’s only 10% , it’ll be OK to pass. Also, do you know by any chance if Adobe provides some sort of API to compare several PDF files? Please, let me know
Thanks
Jeff
Oh, can you please confirm that I’ll be able for example exclude for example date from files and then composer visually? Example only show that approach to Tex comparison. Please let me know.
Thanks
Jeff
Text compare is relatively easy. You can exclude text. Visual compare is difficult. Currently this utility does not support. I have already provided all the methods here.
Hi,
The zip download seems corrupt. Could you update or email a working copy please??
Fixed now. sorry about that.
Hello Vinoth Selvaraj,
first of all thanks, that you share with people. I would like to clear the subject ‘license’. Is it GNU? May I use it in non-profit or commertial appls?
Dmitrii
I should add the open source license. Please feel free to use it.
Hi,
at the end of the article stays:
Features to be added soon:
While comparing PDFs in VISUAL_MODE, ignore certain area.
are you so far? Can be configured which area to be ignored?
Thank you very much,
Robert
Sorry! I no longer work on that utility!
Hello,
I am curious at this point regarding this excludeText(); method. How does it actually works?
Please check the code in github.
Thanks for the response. The problem for example: Two PDFs document have same Date format but the dates are different: i-e: One document has date: 05/03/21 the other has: 23/03/21. Since this method using: equalsIgnoreCase(), which does not see the date format but it sees if they are ==. Which implies test to get failed cause dates are different, I want to ignore these dates simply before comparison.
Can we exludeText in Visual Mode?