Dr. Mark Humphrys

School of Computing. Dublin City University.

Home      Blog      Teaching      Research      Contact

My big idea: Ancient Brain

Search:

CA114      CA170

CA668      CA669      Projects


Link checker


Write a Java program to:
  1. Take a URL as a command-line argument.
  2. Catch errors if bad URL or URL not found.
  3. If good URL, download the page.
  4. Extract all links in the page.
  5. See Parsing HTML with Java
  6. Find all broken links.

  7. For this exercise, we will narrowly define a "broken" link as any link with a HTTP return code of 404, or a link that times out.
  8. For timeout settings see Networking Properties.

  9. Output is a web page:
    • Output the list of broken links to a web page that you can browse (offline) and click on the links.
    • Use this for debugging. If your program claims the link is broken, you can test it here.
    • Do not bother listing any links to Google.
    • Only list URLs with return code 404 or time out. Do not list other URLs.
    • Remove all duplicates.



Test on these URLs:

Your final output should demonstrate your program working on these URLs:

https://computing.dcu.ie/~humphrys/computers.internet.links.html
https://computing.dcu.ie/~humphrys/news.links.html
http://humphrysfamilytree.com/links.html
http://humphrysfamilytree.com/sources.html
http://humphrysfamilytree.com/sources.local.html


To hand up:

What to hand up (Include a printout of the output table when run on the URLs above.)


ancientbrain.com      w2mind.org      humphrysfamilytree.com

On the Internet since 1987.

Wikipedia: Sometimes I link to Wikipedia. I have written something In defence of Wikipedia. It is often a useful starting point but you cannot trust it. Linking to it is like linking to a Google search. A starting point, not a destination. I automatically highlight in red all links to Wikipedia and Google search and other possibly-unreliable user-generated content.