The approved university module specification.
This is the continuous assessment assignment document.
Details of the continuous assessment will be published here.
Copies of the lecture notes in pdf format.
| Slides | ||
|---|---|---|
| Section | 1 up A4 slides | |
| 1: Introduction | a4pdf | |
| 2: Text Retrieval | Part 1 | a4pdf |
| Part 2 | a4pdf | |
| Part 3 | a4pdf | |
| 3: Summarization | a4pdf | |
| 4: Hypertext, Metadata and XML | Part 1 | a4pdf |
| Part 2 | a4pdf | |
| 5: Web Retrieval | a4pdf | |
| Appendix 1: Best-Matching Example | a4pdf | |
Exercises based on material covered in lectures and/or in the lecture notes will appear here.
The homepage of the MIT Media Laboratory. Nicholas Negroponte, the Director of the Media Lab, wrote Being Digital an interesting book looking at development of media technologies. The Media Lab webpages describe many of their projects.
Details and downloadable implementations of Martin Porter's stemming algorithm. Download and try it out in your favourite programming language! Details of stemming algorithms for a range of other languages can be found on the Snowball project page.
Two short articles giving a good basic introduction to Web Search Engines appeared recently in IEEE Computer. This is Web Search Engines: Part 1 (June 2006) and this is Web Search Engines: Part 2 (August 2006).
The Okapi BM25 probabilistic model has been widely adopted in many experimental information retrieval systems. Despite this it is not really covered properly in Modern Information Retrieval. This short Technical Report from University of Cambridge gives an accessible overview of BM25 and also Robertson's method of query expansion for relevance feedback.
The City University, London report on Okapi at TREC-3 gives some detail on the development of BM25 from their earlier work and experimental results.
If you are curious about the formal derivation of the Okapi model you might like to take a look at the original paper from SIGIR conference in 1994. Note: Some of the material in this paper is quite complex. The material in this paper which goes beyond the lectures is not on the syllabus and will not be examined. I'm providing it here for those of you may be curious to read more about the underlying theory.
The Text REtrieval Conference (TREC) runs evaluation exercises for many information retrieval tasks. Papers describing the tasks and individual submissions by the participants are freely available from the TREC website. Techniques (or variants of them) covered in the module are used by many participants and these reports will give more detail of how these methods are actually used.
Two articles om image search and retrieval. A general paper on content-based imafge search From Pixels to Semantic Spaces; Advances in Content-Based Image Retrieval, and on retrieval using colour features Similarity of Color Images.
This is the paper by Tombros and Sanderson describing their experiment looking at query-biased summaries for relevance assessment. This paper also describes their sentence-based summarisation system similar to the one outlined in the lectures.
This paper describes the architecture, technology and evaluation of the Video Mail Retrieval (VMR) system. This further paper describes the adaption of the VMR system for the retrieval of broadcast news.
This paper describes the Video Skimming method from the Informedia project.
A few recent miscellaneous papers on search and retrieval from IEEE Computer: An Information Avalanche, Researchers Make Web Searches More Intelligent, Detecting Ads in Video Streams Using Acoustic and Visual Cues.
The Language Modelling approach to Information Retrieval is very new and had not really emerged when Modern Information Retrieval was published. This is the first paper on this new model which appeared at the SIGIR conference in 1998. Again the paper goes beyond the material in the lectures and only the material in the lectures is examinable. I'm providing the paper here for reading about the subject since it is not covered in any textbooks at present.
Online Book This is a link to the online version of Information Retrieval by C.J. van Rijsbergen (second edition). This book is a classic IR text, although more than 20 years old and out of print for many years most of the material remains relevant today. It is particularly good on indexing, but obviously does not include anything developed in the last 25 years such as BM25 or issues in multimedia or XML.
Linkage-based approaches for information retrieval of the web are described in a number of now "classic" papers. The papers which introduced the PageRank algorithm originally used in Google are Google paper 1 and Google paper 2. A more detailed analysis is given in PageRank Uncovered.
This is a list of relevant and non-relevant past examination questions from papers sets before 2004. All questions set from 2004 onwards are relevant to CA437 for this year. The examination for this year will follow the same style as that from Spring 2004 onwards.
The following past examination papers are available for CA437:
Summer 09 Spring 09 Summer 08 Spring 08 Summer 07 Spring 07 Summer 06 Spring 06 Summer 05 Spring 05 Summer 04 Spring 04This is a sample paper for CA437 2003/2004.
Summer 03 Spring 03CA437 replaced the module CA414 Multimedia Information Systems which was available for a number of years. Much of the material in CA437 is derived from CA414. The following past examination papers are available for CA414.
Spring 03 Autumn 02 Spring 02 Autumn 00 Spring 00 Autumn 99 Spring 99 Autumn 98 Spring 98 Autumn 97 Spring 97
The main difference between CA414 and CA437 is the that CA437 leaves out much of the material relating to Multimedia Systems.