Dr. Mark Humphrys

School of Computing. Dublin City University.

Home      Blog      Teaching      Research      Contact

Online coding site: Ancient Brain

coders   JavaScript worlds


CA114      CA170      CA686      CA686FL

Online AI coding exercises

Project ideas

Wikify a page

Can be done in 50 lines of shell or so.

  1. Usage: wikify file.html > file.wiki.html
  2. "Wikifies" file.html, output to stdout

  3. For all capitalised (i.e. might be proper noun) and un-linked words Word ...
    • See Parsing XML / HTML
    • Can find capitalised word with grep '[A-Z][a-z]'
    • Can extract all links with something like:
      cat file.xhtml | xpath '//a[@href]'

  4. ... Link the word to http://en.wikipedia.org/wiki/Word
  5. (We could check if that URL exists, but I don't want this class practical to cause trouble for Wikipedia's servers, so we will not check here.)
  6. Only link the first occurrence of Word, not subsequent occurrences.

  7. Q. How do you avoid Wikifying words inside tags:
    <title> Word Word Word </title>
    <a href=url> Word Word Word </a>

  8. Test on a sample page from the corpus of the works of Shakespeare.
  9. If you pick the same page as another student, I may get suspicious and compare your code.

  10. What to hand up (Note show HTML source before and after).

ancientbrain.com      w2mind.org      humphrysfamilytree.com

On the Internet since 1987.

Wikipedia: Sometimes I link to Wikipedia. I have written something In defence of Wikipedia. It is often a useful starting point but you cannot trust it. Linking to it is like linking to a Google search. A starting point, not a destination. I automatically highlight in red all links to Wikipedia and Google search and other possibly-unreliable user-generated content.