How to Find Broken Links using Selenium WebDriver? | selenium interview questions |

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
Hello Guys, I'm Yadagiri Reddy and welcome to the  series of selenium interview questions. In this   video, we will see how we can find the broken  links on a website using selenium webdriver.   so what is a broken link? a broken link is a  link on a website that is no longer accessible   or that is no longer exists okay... so that means  for example as a customer you visit one website   and you will click on one link then you  will expect that link to be working right?   like you will expect that link will open some  kind of a web page right? but instead of that,   it is showing some kind of error codes like 404 or  it is saying this site cannot be reached okay...   so this kind of error it is showing to you.  so those links are called broken links because   those links are not actually showing the proper  web pages they are showing the errors right?   so that is why we call them as a broken link  okay... so this might be happening because of   many reasons actually so one reason is maybe  that page is actually removed from the web   application itself okay. that means from that  website that page is deleted or it is removed   okay and the second reason is maybe the website  structure itself modified okay so that means maybe   previously the webpage is present under one folder  like abc okay so later it is moved to xyz folder   but they forgot to update the links okay so in  that case it is still pointing to abc folder only   right so inside the abc folder it is trying to  find out the webpage but now the page is actually   present under xyz folder right so in this kind of  scenarios also the pages will not be loaded right   it says 404 page not found. so combinely these  all are called as broken links okay... so in this   video we will see how we can find these broken  links using selenium webdriver so now we have   discussed about broken link right so let me show  you how it will be in the real-time applications   so here I have an application that is H Y R  Tutorials. so in this application we have a menu   control so here if you mouse hover on selenium  practice we have something called broken links   okay so this one has a sub-menu also. so if you  click on any of these links you can observe the   404 Exception okay... now you can see 404  Exception this page is not present okay or   404 Exception okay... so like that I have some  other websites also the world's worst website ever   and we have another thing that is deadlink city  so you can use any of these websites for practice   today okay... so here if I just click on this  google maps you can see error 404 not found right?   so you should not look at this UI actually this  UI might change so here the UI is like this okay   here the UI is like this. so here you should  not focus on the UI you should focus on the code   okay so the error code here 404 and here also  404 so the webpage is not displaying right so   that means this link is not a proper link. it is a  broken link okay so in a similar way we have some   other links on this website see 404 okay so you  can use any of these websites for the practice   okay so now let me open the eclipse. so here I  have created one class file with some code okay   so this code will launch the chrome browser  and it will navigate to hyrtutorials website   okay so now here I will write the code to test  this broken links okay... so first we need to   understand one thing very clearly so a broken  link is also one link only okay it is not a   new HTML control or something. it is also a link  only. so that means first of all we need to get   all the links present in that website right then  we need to iterate each and every link and we   need to verify whether that link is a proper link  or a broken link right to get all the links i'm   going to write driver dot find elements so here i  want to get all the links not a single link right   so i will use find elements by so the best way  to get all the links is by using tag name okay   because the link tag name is a.so if you use the  tag name as a, it will give you all the links in   that webpage right so here i will say the tag name  as a or you can use the xpath also //a something   like that so that will also do the same job for  you so now this find elements will return a list   of web elements right so i need to store this  one into some variable so here i will say list   of web elements so i will name the variable as  links so now i got all the links into this links   variable right so now what i will do i'll just  print this list size just for the reference.   links.size() okay so after this we are  going to iterate through each and every link   so here you can use for loop or for each loop  so I'm going to use foreach loop because this   is the easiest one right so from this links  list I am going to get one link every time   so now we need to understand one more thing so here if you observe any link right so here  let me take this google maps okay so here the   link syntax will be like this okay it will be a  anchor tag and you will have something called href   attribute so this href attribute will store the  link address where it has to navigate to so when   you click on that link it will navigate to this  website okay so now we need to get this link okay   so how we can do that? we have a method called  getAttribute okay so first i'll store that link   into some variable linkUrl i will say okay so  here i'm not going to use driver dot because   i want to find the attribute for this link so  link dot get attribute so inside this we need   to pass the attribute name so the attribute name  is href so this will return you the link address   that i'm storing into this one okay so  here we need to understand one thing   so from the ui when you click on this link  it is showing this 404 exception right   so we understand that this is a broken link but  from the automation perspective how we are going   to verify this one because you are not going to  inspect this element and you will not verify for   this 404 right so here when you click on any  link it will create a request to the server   then server will try to identify that web  page okay so you are sending one url to the   server right so server will try to identify  that web page then if the web page is found   it will return you the response with the web page  so if the web page is not found it will send you   the response with the error code so based on  the error code we need to understand whether   this is a broken link or it is a proper link okay  so now i'm going to do the same thing. so here we   got the URL in the form of a text right so that  means we are storing in the string but I want to   create the URL instance okay URL is a class in  the java so I want to create that instance URL   url = new URL so here I need to pass this one this  text linkUrl so now the URL instance is created   okay then we need to open the connection to that  server so I will say url dot open connection   so now once you open the connection  it will return you one object   so url connection object it will  return you. so now I need to store that url connection so now in the web applications we usually have two  types of request right http and https so for this   one we have one class httpUrlConnection and in the  similar way for this one also we have some class   so you can use any class to send the request  to the server. so now we have only opened the   connection but we want to send the request right  so for that i need to create the httpUrlConnection   object okay so this httpUrlConnection is a  abstract class if you just go into this one you   can see this is an abstract class so that means  you cannot create an instance to this one right   you can pass this one i mean you can assign this  one to this instance okay so let me do this. so   here i'm assigning like this then it will ask me  to cast so i will provide the casting so now this   url connection is converted into httpUrlConnection  object so using this one we can send the request.   so i'll say connect okay so this will send the  request to the server so before this i will   recommend you to use some kind of a timeout  why because sometimes it will get some time   out exceptions okay so when you are connecting  right because of our internet speed or something   if it is taking more time than the expected time  then we will get the timeout exception so i will   suggest you to use some kind of a timeout here so  how do we set the timeout set connection timeout   so here the argument is int timeout so it is not  a seconds okay it is a millisecond so if you mouse   hover on this so here you can see sets a specified  timeout value in milliseconds. so that means if I   want to give 5 seconds I need to provide 5000  right 5000 milliseconds is 5 seconds. so now   we send the request to the server right so then  server will try to identify the webpage then it   will send you the response right so that response  also will be present under this httpUrlConnection   only so here I need to verify that server  response I am going to put an if condition here   so httpUrlConnection.getResponseCode so we have a  method called getResponseCode if you go into this   one it says if it is 200 that means it is a valid  URL so other than 200 it is a invalid URL so here   for 401 it is unauthorized like that we have so  many codes okay so if you go to this deadlink city   here you can see all the other codes 400 401  402 so like that we have so many error codes   so other than 200 anything else is a broken link  only so now I will put a condition to verify 200   so it is giving the response code right  so I'm verifying whether it is a 200 or   not so that means I am verifying  whether it is a valid URL or not   so if it is a valid URL I will print something  like this in the console so I will say link URL then i will print the status code also  this get response code so it is just for   the user purpose i mean to identify okay  i will give again some space like this   and i will print the response message so here we  have some response messages also http connection   dot get response message so now this will print  the code i mean so here obviously it is 200 right   so let me copy this so for the else condition i  will add this code okay so here if it is not 200   that means it is an invalid url right there is  a broken link so i am going to use sys dot error   so here i am just printing in the red color  so that we can see easily so here i'm going   to remove the response code from the positive one  because in the positive section i don't need the   response code right i know that response code is  200 already so after this here we have connected   right i mean we sent the request to that one right  so we need to disconnect the connection also okay   so http url connection dot disconnect so now let  me execute this program so here the webpage is   opened so once all the urls are printed in the  console the webpage will be closed automatically okay so now the webpage is closed right so  that means verification is completed   so here you can see these three are printed in  the red color right that means it is a invalid url   or maybe something else here it is saying  mood permanently so if you open this webpage so here it is opening right but if you look  at this one it is moved to https okay so if   i just copy this url so it moved to https okay  but in my application it is referring to http   so that is why it is saying more permanently so  for that three not one is the code error code   okay so if you again come to forward okay so  here we got the broken links right four not four   so whatever i have intentionally put those links  are printed here right so now we are actually   verifying it in the correct way only but it is  very difficult to scroll again in the console   and identify which one is a valid and which one is  a invalid one right so to make it easier what you   can do you can create one list okay you can create  one list or set and you can add all these invalid   urls into that one and you can print only that  one so here you don't need to print this valid   one right so we know that already it is a proper  link so there is no point of printing this one   so that is why we are going to just modify  this code so here i will create one set string so here i am using set because i  don't want to store the duplicate urls also   okay because while i'm printing it will print the  duplicate again so that is why i'm using set here   broken link urls okay so  i'll just instantiate this you can create any set here so let me just  copy this so here i'll just modify this code   okay so inside this if condition i'll write  something like this so instead of this 200   i will say not equal so if it is not 200 i wanted  to add to this one dot add and i want to add this   link url to this one and after the program  completion even after this driver closing   i will print all these urls so i print  them so let me run the program again so this is not a much difference guys okay but  it will actually print you in a proper way so   you don't want all the urls to be printed in  the console right so this time it will print   only the invalid urls that means the broken links  only in the console so when the browser is closed   we need to understand that the verification  process is completed okay so now the process   is completed okay so here you can see it printed  only the broken links okay so this time i'm not   printing all the links i'm just printing only  the broken links and manually i can just go   and verify whether this is working correctly or  not what is the issue and all then i can report   it to the developer right so this process is very  easy so that is how we find the broken links okay   so here i want to tell you one more thing okay so  that is very important so basically when you do   this one at home it will not throw you any errors  why because you will not have any proxy related   issues okay but when you are running the script  in the companies right in service based companies   or something where you will have some kind of a  proxy right so in that case it will not execute   because it will try to connect to that url but  because of proxy it will not able to connect   okay then it will say all the urls are invalid  only so for that you can add the proxy here okay   so how you can add so after creating this listing  url right so you can create a proxy instance proxy   new proxy okay so here let me just import  the proxy reference so it is coming from   java.net package okay not the other one so  here proxy we need to select the type okay   for type also you need to  select this proxy java.net.proxy   dot the type is http or sox or anything okay so  usually it will be http right then here we need to   pass the socket address that means your host name  and port number every proxy will have some kind of   a host name and port number right so that we need  to create here inet socket address so here you can   see we have inet address or we have host name and  port so you can use this one so here you can give   the host name okay now i'm giving this random one  and port will be like 8080 or 443 or something so   now the proxy instance is created right so where  do we use this proxy at the time of url connection   okay so here we are actually opening the url  right so if you just pass this proxy into this   one then you are good to go so you can execute the  script behind the proxy also even though the proxy   is set up in your applications i mean in your  company you can still verify the broken links okay   so this is how we find the broken links using  selenium webdriver okay so if you have any proxy   related issues you can use this proxy okay so if  you don't have any proxy related issues you can   remove the proxy and use the normal program okay  so i will upload this program into my github url   my github repo is this one the other graded  slash hvr tutorials so in this one you can   find one folder selenium interview questions  so into this one this program three okay   so this is a program three so i will upload this  one into this one i will leave the link also in   the description box below so this github url link  and this practice websites links also i will leave   okay so that is for this video guys if you  have any doubts or if you are facing any   issues please let me know in the comment  section below thank you for watching bye bye
Info
Channel: H Y R Tutorials
Views: 4,342
Rating: undefined out of 5
Keywords: hyr tutorials, h y r tutorials, selenium webdriver tutorial, selenium webdriver, selenium tutorial, selenium webdriver interview questions and answers, selenium tutorial for beginners, selenium webdriver tutorial for beginners, selenium, webdriver, java, selenium java, selenium interview questions, How to Find Broken Links using Selenium WebDriver, broken links in selenium webdriver, broken links, find broken links of website, httpurlconnection java, find broken links in selenium
Id: 3liZaog-xXM
Channel Id: undefined
Length: 17min 23sec (1043 seconds)
Published: Wed Sep 16 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.