Overview:
In this article, we will see how we could use Selenium WebDriver to find broken links on a webpage.
Utility Class:
To demonstrate how it works, I would be using below simple utility. This below method simply returns the HTTP response code the given URL.
public class LinkUtil {
// hits the given url and returns the HTTP response code
public static int getResponseCode(String link) {
URL url;
HttpURLConnection con = null;
Integer responsecode = 0;
try {
url = new URL(link);
con = (HttpURLConnection) url.openConnection();
responsecode = con.getResponseCode();
} catch (Exception e) {
// skip
} finally {
if (null != con)
con.disconnect();
}
return responsecode;
}
}
Usage:
Rest is simple. Find all the elements which has the href / src attribute and by using the above utility class, we could collect the response codes for all the links and groups them based on the response codes.
driver.get("https://www.yahoo.com");
Map<Integer, List<String>> map = driver.findElements(By.xpath("//*[@href]"))
.stream() // find all elements which has href attribute & process one by one
.map(ele -> ele.getAttribute("href")) // get the value of href
.map(String::trim) // trim the text
.distinct() // there could be duplicate links , so find unique
.collect(Collectors.groupingBy(LinkUtil::getResponseCode)); // group the links based on the response code
Now we could access the urls based on the response code we are interested in.
map.get(200) // will contain all the good urls
map.get(403) // will contain all the 'Forbidden' urls
map.get(404) // will contain all the 'Not Found' urls
map.get(0) // will contain all the unknown host urls
We can even simplify this – just partition the urls if the response code is 200 or not.
Map<Boolean, List<String>> map= driver.findElements(By.xpath("//*[@href]")) // find all elements which has href attribute
.stream()
.map(ele -> ele.getAttribute("href")) // get the value of href
.map(String::trim) // trim the text
.distinct() // there could be duplicate links , so find unique
.collect(Collectors.partitioningBy(link -> LinkUtil.getResponseCode(link) == 200)); // partition based on response code
Simply we could access the map to list all the bad urls as shown here.
map.get(true) // will contain all the good urls
map.get(false) // will contain all the bad urls
Print all the bad URLs.
map.get(false)
.stream()
.forEach(System.out::println);
Happy Testing & Subscribe 🙂
I am able to run this code however I am unable to print using the line “map.get(false)
.stream()
.forEach(System.out::println);”
KD,
Please check the updated code. Basically the Map should be set with the exact type like this “Map<Boolean, List<String>> map”. Mostly the IDE itself will correct this.
Ok I shall try and let you know. Thanks Bro.
Just a quick question like, whether it gives the broken links for all the links and sublinks in the website main URL provided?
KD, it scans entire page which has the href attribute which is link to other sits and checks if the site is reachable or not.
Ok