Read PDF content using Selenium

Webner Solutions
2 min readJan 31, 2022

--

Read PDF content using Selenium
Read PDF content using Selenium

To read a PDF document file in Selenium, we can use a Java library called PDFBox. Apache PDFBox is an open-source library that helps in managing PDF files. We can use it to verify the text or images present in the file. To use this with Selenium testing, we need to add the maven dependency in the pom.xml file or add an external jar in the build path.

Here we will use add as an external jar method:

  • Download the jar file from the below path:
    https://pdfbox.apache.org/download.html
    I am using the jar version of PDFbox 1.8.16.
  • Go to the project and select “Configure Build Path” and add the external jar file.
  • After adding the jar, click on the “apply” and “close” buttons.

Code to extract the content of the PDF:

package Testing;
import java.io.BufferedInputStream;
import java.io.InputStream;
import java.net.URL;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.util.PDFTextStripper;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.chrome.ChromeDriver;
import io.github.bonigarcia.wdm.WebDriverManager;
public class pdfread {
public static WebDriver driver;
public void ReadPDF() throws Exception {
WebDriverManager.chromedriver().setup();
driver = new ChromeDriver();
driver.manage().window().maximize();
driver.get("https://unec.edu.az/application/uploads/2014/12/pdf-sample.pdf");
String Currentlink=driver.getCurrentUrl();
URL URL = new URL(Currentlink);
InputStream Inputfile = URL.openStream();
BufferedInputStream file =new BufferedInputStream(Inputfile);
PDDocument document = PDDocument.load(file);
String pdfContent= new PDFTextStripper().getText(document);
System.out.println(pdfContent);
}
public static void main(String[] args) throws Exception {
pdfread read = new pdfread();
read.ReadPDF();
driver.quit();
}
}

Result:

Webner Solutions is a Software Development company focused on developing Insurance Agency Management Systems, Learning Management Systems and Salesforce apps. Contact us at dev@webners.com for your Insurance, eLearning and Salesforce applications.

Originally published at https://blog.webnersolutions.com on January 31, 2022.

--

--

Webner Solutions
Webner Solutions

Written by Webner Solutions

Our team in Salesforce is very strong, with in-depth knowledge of Salesforce classic and Lightning development as well as the Service cloud.

No responses yet