User-Interfaces for Video Browsing & Searching in CDVP

In the Centre for Digital Video Processing, we demonstrate our approach in research by developing complete systems with front ends so that we can imagine and envisage how our R & D (shot/scene boundary detection, news story segmentation, face/object detection and tracking, linking among similar entities, event detection in films, etc.) might be actually used in some realistic sutiation where an end user could interact with and get benefit out of it. In this vein, the systems we develop are no longer just 'fancy demos' that just literally demonstrate some output of internal algorithms: we consider a usability in context of a possible scenario, designing in all visual-, functional- (adding other related functionality if desired), and reflective-levels to make it as real and usable at the same time, combine various elements and projection of techniques in some coherent and consistent manner. Prototyping is an important element of our work to see and feel what such a system could look like earlier in the system development process, as is done in the case of any other proper software development. Video content browsing and searching based on fully automatically processed features is still only an experimental stage in the field, but here we have a list of systems (with their proper interfaces) pioneering and shaping the future usage of automatic, content-based video retrieval systems.


Físchlár-TV : An online VCR - recording, browsing and watching TV programmes
The very first Físchlár system we have deployed to our university campus, which allowed its users to request recording of any TV programmes from 8 channels and once recorded, processed and indexed to allow browsing and playing on a web browser (User Guide). The system enjoyed more than 2,000 registered users at one time. In designing this interface, we started with browsing vs. recording features as the two main features, and took up the look-and-feel of a black VCR box as a metaphor. As it was being used by a large number of real users, we had to be careful in upgrading or plugging-in any new features. The interface features various within-video browsers, which was a subject of a PhD thesis: see a concise illustration.

Designing the User Interface for the Físchlár Digital Video Library. Lee H and Smeaton A.F. Journal of Digital Information, 2(4), 2002 (JoDI online).

Físchlár-Nursing : An online archive of Nursing-related video materials
This is a variation of the above system, but contains a fixed number of nursing-related materials for teaching and learning in the School of Nursing. The overview of a programme is manually generated to provide a good Table of Contents of each video. Interaction stages follow exactly same as Físchlár-TV. The User Guide shows what programmes are currently available for browsing and watching.

Físchlár-Nursing: Using Digital Video Libraries to Teach Processes to Nursing Students. Gurrin C, Browne P, Smeaton A.F, Lee H, Mc Donald K and MacNeela P. WBE 2004 - IASTED International Conference on Web-Based Education, Innsbruck, Austria, 16-18 February 2004.

Físchlár-News : An online news story archive of daily RTE 9pm news
Based on the automatic news story segmentation, the system automatically records, indexes and provides news story-level of access to its users (User Guide). Currently more than 6,500 news stories (starting from April 2003) are in the database, all searchable, browseable and playable on a web browser. In Spring 2004, an extensive usage study was conducted with this system, monitoring 16 users who regularly use the system for 1 month while they noted down a diary. The system also comes with a mobile interface for PDA such as iPAQ and XDA: its primary access method is personalised story list, to reduce intensive interaction on a mobile environment.

User Evaluation of Físchlár-News: An Automatic Broadcast News Delivery System. Lee H, Smeaton A.F, O'Connor N and Smyth B. ACM TOIS - ACM Transactions on Information Systems, 2006.

Físchlár-TREC2002 : Searching through shots containing simple semantic features
An experimental system that can search through videos by indoor/outdoor, existence of face, people, audio, in-video captions (open caption), and also text transcribed by automatic speech recognition. Using a 4-colour scheme assigned to the grouping of the features and icons for each feature, the whole interface combines the complicated elements into an organised, coherent visualisation and interaction.

Design, Implementation and Testing of an Interactive Video Retrieval System. Gaughan G, Smeaton A.F, Gurrin C, Lee H and Mc Donald K. MIR 2003 - 5th ACM SIGMM International Workshop on Multimedia Information Retrieval, Berkeley, CA, 7 November 2003.

Físchlár-TREC2003 : Searching video by relevance feedback of a keyframe
This experimental system features a relevance feedback in which a user can add any keyframes encountered during browsing to be used for subsequent querying, implementing 'find more like this'. Text based on automatic speech recognition is also featured, and the weighting between text and image can be specified by the user. A particular effort was paid to relate the add-to-query button (under each keyframe) to the query box where the keyframes will be added, by using a colour scheme.

Dublin City University Video Track Experiments for TREC 2003. Browne P, Czirjek C, Gaughan G, Gurrin C, Jones G, Lee H Marlow S, Mc Donald K, Murphy N, O'Connor N, O'Hare N, Smeaton A.F, and Ye J. TRECVID 2003 - Text REtrieval Conference TRECVID Workshop, Gaithersburg, Maryland, 17-18 November 2003.

Físchlár-TREC2004 : Searching video by relevance feedback based on low-level features of a keyframe
A refinement of TREC2003 system: in terms of features, now the user can specify a particular aspect of the keyframe in relevance feedback ('find video clips where a colour/texture/edge of the keyframe is similar to this ones'). In terms of interface, colour scheme and widget-level has improved considerably to give a more product-like, aesthetically pleasing effect, and the saved shot list is added on the right end of the screen showing what the user has collected so far. This is probably one of the best examples in which a great amount of different kinds of information is displayed on a single screen but they are properly grouped, separated and colour-coded in such a way that it succeeds to look simple and not cluttered at all.

Físchlár-TRECVid2004: Combined Text- and Image-Based Searching of Video Archives. O'Connor N, Lee H, Smeaton A.F, Jones G, Cooke E, Le Borgne H and Gurrin C. ISCAS 2006 - IEEE International Symposium on Circuits and Systems, Kos, Greece, 21-24 May 2006.


Interaction with all of the above interfaces are based on whole keyframes (or segments of a video). Sub-keyframe level is only enabled on the above TREC2004 system in which regional colour and edge can be specified by the user in querying, but they still remain low-level features. L'OEUVRE Project, which investigates object-based operations in video, brings forward the whole state of the art, and the interface design has also started moving forward accordingly. Following designs allow the users the ability to interact with objects detected in video contents: viewing what has been detected; selecting one of the detected objects in a keyframe; using a selected object for subsequent querying. Based on the bottom-level interaction with objects as a 'unit representation', overall interfaces have been designed for different contexts and possible usage.

L'OEUVRE: Interacting with Objects : Linking among similar objects in video through buttons
This is the initial conceptual sketch on how the object-level interaction should be projected to the user. Here we used small, oval buttons that represent each of the detected objects in a keyframe, and thus interacting with objects is enabled by mouse-over or clicking on the button indirectly. This concept has been integrated in a more realistic context for all the following designs we did for object-based interaction.
Object & Link Filter Interface : Allowing link filters for object classes
This is based on L'OEUVRE interface above, but advanced in the sense that various object classifications (people's name, object, object classes, scene/background, and actions) are also automatically done and filtering based on them. Quite futuristic in terms of automatic features.
CCTV Archive Search Interface : Allowing efficient search through large CCTV video archive
Again based on objects and unit representation, this one is applied to security and surveillance system. DCU's CCTV cameras keep recording and archiving a large volume of footages, and a security staff needs to search through this archive when a theft has happened, for example. Knowing only a approximate time/date and location still takes long time to search, possibly missing all other potentially useful information that has been captured in nearby cameras around that time, all of which could have been valuable in re-constructing the event (forensic analysis). This interface efficiently indicates wanted people/objects in the archive, searches the same person/object found in other nearby cameras and visualises in map and highlighting their trails by time, supporting an efficient forensic analysis of an event.

User-Interface to a CCTV Video Search System. Lee H, Smeaton A.F, O'Connor N and Murphy N. ICDP 2005 - IEE International Symposium on Imaging for Crime Detection and Prevention, London, U.K., 7-8 June 2005.

Object-based Query Formulation : An object and their features as relevance feedback
This system allows the user to compose a query based on objects and their low-level attributes (colour, texture and shape). Query formulation incorporates unit representation idea in L'OEUVRE interaction design as above, but the cumulation of user queries become a relevance feedback, to incrementally refine the automatic object classification in the database. We put a particular attention on the feature feedback buttons, as these are the most important query element of the system. Developed by Sorin.

Interactive Object-based Retrieval Using Relevance Feedback. Sav S, Lee H, O'Connor N and Smeaton A.F. Acivs 2005 - Advanced Concepts for Intelligent Vision Systems, Antwerp, Belgium, 20-23 September 2005.

Advanced Object-based Query Interface : Automatically splitting query objects into similar groups
This is advanced version of the previous interface. When the user adds more and more example objects (and their features) as query formulation, if the added examples are not very similar to each other it tends to confuse the system and results in poor retrieval result. However, the addition of any example objects is a legitimate action allowed by the interface, yet we do not want a user to add semantically very different objects... so the solution provided in this interface is the system suggests the user the possible clusters among the query objects. Separated clusters then can be individually searched, allowing a more focused searching thus the result is better.

Using Segmented Objects in Ostensive Video Shot Retrieval. Sav S, Lee H, Smeaton A.F. and O'Connor N. AMR 2005 - 3rd International Workshop on Adaptive Multimedia Retrieval, Glasgow, U.K., 28-29 July 2005.

Film Event Browser : Searching for action, dialogue, and montage scenes in movies
'Scene detection' is still not fully explored area except special genre such as news stories or sports event scenes. The system, developed by Bart, automatically finds where exciting (action) scenes, dialogue, or montages scenes in films are, and helps the user quickly spot those particular scenes. These three particular scenes are a 'prescribed' set of parameters pre-defined for the users, and more generic query formulation is provided in which the user can adjust these parameters to customise the query formulation beyond action/dialogue/montage scenes.

Searching Movies based on User Defined Semantic Events. Lehane B, O'Connor N and Lee H. SIGMAP 2006 - International Conference on Signal Processing and Multimedia Applications, Setubal, Portugal, 7-10 August 2006.

BBC Rushes Explorer : Object-based query using any external images
It is good to provide Relevance Feedback feature so that some relevant image/object in the database can be used for formulating subsequent query... but the starting point (initial query) is often not supported and relies on text-query. BBC Rushes Explorer allows the user to use any image external search engine (e.g. Google Image Search) and incorporate those images directly into the system to formulate initial query, as well as using the database images/objects (50 hours of BBC rushes footage, in participation to TRECVid2005 activity). The incorporated images can be further segmented to object by the user, and the object is then used for RF query. Using interactive object segmentation interface and object-based RF feature of this system, the overall interaction is smooth, user's RF is open to *any* image or object outside of the database.

Interactive Experiments in Object-Based Retrieval. Sav S, Jones G, Lee H, O'Connor N and Smeaton A.F. CIVR 2006 - 5th International Conference on Image and Video Retrieval, Tempe, AZ, 13-15 July 2006.

Físchlár-DiamondTouch : Collaborative video searching on a TableTop
Moving out of the conventional mouse, keyboard, monitor for single-user video searching, Físchlár-DT is a TableTop video search interface on top of Físchlár Digital Video System. The Tabletop is based on DiamondTouch table, with DiamondSpin software toolkit. Designing a collaborative interface on which two users working together to search for video shots require consdering interesting issues such as division of the task between the users, workspace awareness, and widget/object coordination policy between the users. We explore these issues by designing, implementing and experimenting with users and assessing in terms of search performance, amount/kinds of interaction bewteen the users and their personality type matching. The system was developed by Colum and Sinéad, and demonstrated at TRECVid2005 at Gaithersburg, Maryland, in November 2005.

Collaborative Video Searching on a Tabletop. Smeaton A.F, Lee H, Foley C and McGivney S. Multimedia Systems Journal, Springer, 2006

MediAssist : Online Personal Photo Management
Neil's cool Web 2.0 application is an online photoware that uses automatic photo analysis techniques to index and organise a large amount of personal photos very easily. Photo content analysis such as face detection and labelling, building detection, combined with context analysis such as the GPS data and shutter speed recorded on the photo are used to automatically annotate the photos. The user can then see the automatic annotations and confirm or correct them. Mobile interface for browsing and searching for photos on this system has been also developed and experimented.

MediAssist: Using Content-Based Analysis and Context to Manage Personal Photo Collections. O'Hare N, Lee H, Cooray S, Gurrin C, Jones G, Malobabic J, O'Connor N, Smeaton A.F and Uscilowski B. CIVR 2006 - 5th International Conference on Image and Video Retrieval. Springer Lecture Notes in Computer Science Vol. 4071/2006. Tempe, AZ, 13-15 July 2006. (pp529-532)

My Visual Diary : SenseCam Image Browser
SenseCam is a wearable digital camera with sensors wired in, automatically triggering photo capture when something happens in the surrounding. In a usual day, wearing a SenseCam will result in 1,500 - 3,000 photos, in effect visually archiving a person's day. Once there are this many photos, organising, annotating and retrieving becomes a headache. In CDVP we develop automatic organisation tools to group the photos by individual events of the day, to determine repeating or unique patterns of the events, and to establish similarities among those events. My Visual Diary is a web-based interface that presents the photos as an interactive comic-book style interface, as a result of the automatic analysis of the photos. See more information.

Adaptive Visual Summary of LifeLog Photos for Personal Information Management. Lee H, Smeaton A.F, O'Connor N and Jones G. AIR Workshop - 1st International Workshop on Adaptive Information Retrieval, Glasgow, Scotland, 14 October 2006.

Body Sensor Visualisation : Sports Analysis using Body Sensor Data
A soccer player wired with BodyMedia device that captures a number of body response data along with a GPS device and two video streams capturing his movements gets his data all recorded and analysed at the end of the game. Data is analysed, synchronised and presented to an interface that displays sensed data beside video and soccer field location of the player all in one synchronised way. Juxtaposition of different sources of data with different characteristics (but connected by time dimension) can reveal many interesting and potentially useful facts - the main benefit of any visualisation. Interface was implemented by Kirk.

Aggregating Multiple Body Sensors for Analysis in Sports. Smeaton A.F, Diamond D, Kelly P, Moran K, Lau K, Morris D, Moyna N, O'Connor N and Zhang K. pHealth 2008 - International Workshop on Wearable, Micro and Nano Technologies for the Personalised Health, Valencia, Spain, 21-23 June 2008.

Mo Músaem Fíorúil - My Virtual Museum : Museum artefact browser
A museum visitor taking a number of photos of the exhibited artefacts comes back home, and gets confused which photo is which artefact, cannot remember what artefact had which history or what title, etc. She uploads all these photos to the system, and it automatically identifies same artefacts from many photos, categorises them, and links them to the museum website's authentic photos and information. The user can manually correct if any categorisation was done incorrectly, by easily dragging items around to different groupings - the dynamic Flash interface doesn't flash; only to enhance the usability in a quiet way. Back-end engine researched and developed by Michael, Flash interface implemented by Sorin.

Mo Músaem Fíorúil: A Web-based Search and Information Service for Museum Visitors. Blighe M, Sav S, Lee H and O'Connor N. ICIAR 2008 - International Conference on Image Analysis and Recognition, Povoa de Varzim, Portugal, 25-27 June 2008.

Centre for Digital Video Processing, Dublin City University 2008