Dr. Mark Humphrys

School of Computing. Dublin City University.

Home      Blog      Teaching      Research      Contact

My big idea: Ancient Brain


CA114      CA170

CA668      CA669      Projects

Search engine


  1. Write an offline search engine to search offline web pages in your file system, and produce an offline output web page where you can click on links.
  2. You may not have web pages to search, so we will test it on my sample corpus of the works of Shakespeare.
  3. Call it gweb ("grep web"). Usage like:
    gweb string
  4. It searches the test corpus for the input string.
  5. It sends its output into an (offline) output web page:

Start with this

Start with this template:

# when testing, comment/uncomment the following line
# comment - output goes to screen
# uncomment - output goes to file

	exec > OUTPUTFILE           


echo '<pre>'
grep -i "$1"  */*html  
echo '</pre>'


  1. Change OUTPUTFILE to the desired output file.
  2. Change SHAKESPEAREDIR to the location of my Shakespeare corpus.
  3. Test that it works with a sample search.

  4. When the above is working, pipe the grep line to a sed command like here to print the HTML tags without interpreting them.
  5. Test that it works with a sample search.


  1. When the above is working:
    Make the files clickable.

  2. The basic grep above gives output like this:

    file.html: hit

  3. Pipe the grep output to a second script called "clickable", which constructs links to the files.
    "clickable" looks like this:
    while read line
     file=`echo "$line" | [CUT BEFORE THE COLON]`
      hit=`echo "$line" | [CUT AFTER THE COLON]`
     echo "<a href=[URL] > [FILE]</a>: [HIT] <br>"
    The bits in [BOLD] you need to work out yourself.
    See how to use cut with grep output.

  4. You can now click on hits in the output page to see them (offline).
    Check the link works! You may need to adjust the path.


  1. For example:
    gweb northumberland
    will show all lines in the corpus where "northumberland" appears in any case.

ancientbrain.com      w2mind.org      humphrysfamilytree.com

On the Internet since 1987.

Wikipedia: Sometimes I link to Wikipedia. I have written something In defence of Wikipedia. It is often a useful starting point but you cannot trust it. Linking to it is like linking to a Google search. A starting point, not a destination. I automatically highlight in red all links to Wikipedia and Google search and other possibly-unreliable user-generated content.