Dr. Mark Humphrys

School of Computing. Dublin City University.

Home      Blog      Teaching      Research      Contact

My big idea: Ancient Brain


CA114      CA170

CA668      CA669      Projects

String searching


grep defines a worldview on UNIX:
  1. Files in text format, not binary.
  2. Not forced to go through other people's applications.
  3. Can directly search the files yourself on the command line.
My search engine based on grep

find on DOS.

But how does grep (or other string-finding programs) work?


Naive (brute force) algorithm

string[0] ... string[n-1]
search for:
pattern[0] ... pattern[m-1]

i = 0  	// start by trying to match from LHS of string
j = 0

 if ( string[i] == pattern[j] )
  i = i - j + 1   	// (*)
  j = 0    			// reset pattern to start
until ( (j >= m) or (i >= n) )

if ( j == m ) return match  		// matches string[i-m] .. string[i-1]
 else return no match

(*) say i = 50 and we have been successfully matching pattern for a bit
started at i = 50, j = 0
now i = 57, j = 7, when suddenly it fails
go to i = 57 - 7 + 1 = 51 and try with j = 0 from there

Performance of naive algorithm

  1. Large alphabet (e.g. Alphanumeric)
    Imagine if pattern = "STING"
    S gets triggered 3 times before start of the match
    ST gets triggered 1 time before the match
    j++ executed 4 times before match
    in typical text search, the if condition above is true only rarely
    pointless j++'s (for aborted matches) are few
    running time is O(m+n)

  2. Small alphabet (e.g. Binary)
    However, if alphabet is small, this scales worse.
    e.g. pattern = 000000001
    string = 000000000000000000000000000000000000000000000000000000000000000100
    worst case is O(mn)

Knuth–Morris–Pratt string search algorithm

Boyer-Moore string search algorithm

What does grep use?

Boyer-Moore is used, for example, in programming language libraries for methods without regular expressions, things like:
substring(pattern,string) - return if pattern is found within string


ancientbrain.com      w2mind.org      humphrysfamilytree.com

On the Internet since 1987.

Wikipedia: Sometimes I link to Wikipedia. I have written something In defence of Wikipedia. It is often a useful starting point but you cannot trust it. Linking to it is like linking to a Google search. A starting point, not a destination. I automatically highlight in red all links to Wikipedia and Google search and other possibly-unreliable user-generated content.