School of Computing
Research Postgraduate Seminar Series

 

Date Speaker Title
Tues. 30/11/04 Ruth O'Donovan Automatic Extraction of Large-Scale, Multilingual Lexical Resources
Thurs. 2/12/04 Thomas Koller Creation and evaluation of a plurilingual dictionary tool
Tues. 7/12/04 Noreen Quinn Modeling the Liveweight of Irish Dairy Cows
Thurs. 16/12/04 Noel McCullagh A New Identity Based Two Party Key Agreement
Thurs. 13/1/05 Niall McMahon Wind Energy - An Overview
Tues. 18/1/05 John Judge Boosting parser performance in specific domains
Thurs. 20/1/05 Puspita Deo Space requirement implications for Hetrogeneous Traffic Manoeuvres
Tues. 25/1/05 Katrina Keogh Modern Foreign Languages and ICT in the Primary School Environment
Thurs. 27/1/05 Patricia Gunning Proposed Estimation Method for Financial Auditing
Tues. 1/2/05 Neill Sweeney Computer Poker and Opponent Modeling
Thurs. 3/2/05 Claire Kenny Automated Tutoring for a Database Skills Training Environment
Tues. 8/2/05 Andrew Errity Analysis and Synthesis of Speech based on Nonlinear Dimensionality Reduction
Thurs. 10/2/05 Fabrice Camous Genomic Information Retrieval Using Links
Tues. 15/2/05 Paul Ferguson Terabyte Search: Large Scale Web Search Experiments for TREC 2004
Thurs. 17/2/05 Mark Melia Ontology-based Adaptive Content Navigation
Tues. 22/2/05 Bernard Gorman Imitation Learning in Interactive Computer Games
Thurs. 24/2/05 Colm O'hEigeartaigh Recent breakthroughs against SHA-1
Thurs. 3/3/05 Declan Groves Hybrid SMT: Robust Sub-Sentential Alignment of Phrase-Structure Trees
Tues. 8/3/05 Adel Sharkasi Long-Term Memory in the Irish Market (ISEQ): Additional Insight from Wavelet Transform
Thurs. 10/3/05 Ana Barat Monte Carlo and Cellular Automata methods for Simulation of Drug Dissolution
Tues. 22/3/05 Cara Greene Computer-Assisted Language Learning (CALL) for Dyslexic Learners
Thurs. 24/3/05 Humayun Kabir Poitin: Distilling Theorems from Programs
Thurs. 31/3/05 Ronan Barrett Web Service Technologies and Real World Applications
Thurs. 7/4/05 Anna Khasin TransBooster: piece-wise machine translation
Tues. 12/4/05
Karen Bolger Project Management Failure - Does a practical solution exist ?
Thurs. 14/4/05 Bart Mellebeek Transbooster: boosting the performance of existing MT by complex sentence reduction.
Tues. 19/4/05 Sandra Rothwell Automatic Summarisation of Digitial Visoe based on Analysis of Musical Score
Thurs. 21/4/05 Martin O'Connor Level-based Indexing for Optimising XML Queries
Thurs. 28/4/05 Ciaran Ferry Consistency management mechanisms in Internet content delivery
Thurs. 5/5/05 Noel King A Metadata Repository for Facilitating Process Composition
Tues. 10/5/05
Georgina Gaughan
Finding New News: Novelty Detection in Broadcast News
Thurs. 12/5/05 Claire Wheelan Side Channel Analysis of Pairing-Based Cryptography
Tues. 17/5/05
Joachim Wagner Probabilistic detection of ungrammatical sentences
Thurs. 19/5/05 Biren Patnaik Distillation: Higher Order Transformation for Higher Order Programs

 

All seminars will take place in L221 at 4pm.

 


Abstracts:

Automatic Extraction of Large-Scale, Multilingual Lexical Resources

Comprehensive lexicons are crucial for wide-coverage parsers and machine translation engines using modern syntactic theories such as Lexical-Functional Grammar (LFG). One part of the lexicon is the description of the subcategorisation requirements for all predicates - that is, the arguments that the predicate must have in order to form a grammatical construction. Manually constructing a lexicon is time-consuming, error-prone and expensive and has to be done anew for every language. My research focuses on automatically and efficiently building large-scale, high-quality lexical resources for multiple languages. I will present a methodology for extracting subcategorisation information from the Penn-II and Penn-III Treebanks which have been automatically annotated with LFG f-structures. My approach allows us to control the level of detail in the frames: for example whether particles and prepositions are specified. It differentiates between active and passive frames and fully reflects long distance dependencies in the source data structures. The extracted frames can be filtered to optimise coverage or quality. In contrast to many other approaches, frames are learned from the data rather than predefined. I have carried out a large-scale evaluation of the entire extracted lexicon against the COMLEX resource. To my knowledge, this is the largest evaluation of subcategorisation frames for English. The extraction technique for English has been successfully migrated to Spanish, German and Chinese treebanks despite typological differences and variations in treebank encoding.


Creation and evaluation of a plurilingual dictionary tool

In my research I am developing plurilingual language learning software for French, Italian and Spanish. The plurilingual learning approach exploits learners' knowledge of similar languages and aims to teach the languages contrastively.

One problem of existing plurilingual learning materials is the limited number of available texts to work with. Therefore I decided to develop a plurilingual dictionary tool which allows the learner to enter any text in one of these languages. Currently the tool is a stand-alone dictionary but later it will be integrated into different language learning activities.

In my talk I will first give an overview of the features of the dictionary tool. Then I will describe the technical side of development focusing on how data storage and processing is done with XML, Flash and PHP. Finally I will present some evaluation results.

If you want to test the dictionary tool beforehand just point your browser to
http://www.computing.dcu.ie/~tkoller/phd/dict/dict.htm


Modelling the Liveweight[1] of Irish Dairy Cows

Many researchers have modelled cows’ liveweight from calving to maturity using growth curves, while others have modelled liveweight using body measurements. However, the objective of this study is to derive an equation which is biologically interpretable that will model the liveweight of Irish dairy cows over a lactation period.  The dataset consists of liveweight recordings on a weekly or monthly basis throughout 12,496 lactations of spring calving cows, from 83 herds. As the data used in this study is time series data it was decided to look at some time series techniques initially. Splines were also examined to indicate how many dimensions were necessary to fit the dataset involved in this study. As the liveweight curve is similar to an inverted milk yield curve the models that were used to predict milk yield, were also tested. Ultimately liveweight changes between two calvings were modelled as a function of age, lactation and pregnancy. As multicollinearity was a severe problem with this function, the variance inflation factor was examined to find out which variables contributed to it and principal component analysis was carried out on the variables responsible for severe multicollinearity. The new model has a  of 0.76, the effect of multicollinearity is weak and the residuals are normal, homoskedastic and independent. This new model therefore provides an acceptable level of accuracy in representing the shape of the liveweight curve for Irish dairy cows.

[1] The liveweight of a cow is the weight of a live cow.


A new identity based two party key agreement


This is in the area of identity based cryptography. Those of you who were at my transfer talk may remember that identity based cryptography is a relatively new branch of cryptography in which one may transform a public identifier into a PKI style public key using a well known algorithm. There are currently two different public key generation algorithms for transforming an identifier into a public key. The first, that was developed by Boneh and Franklin, and a new varient that was developed by Sakai and Kasahara. We will look at the differences between these algorithms and we use this second as the basis for an authenticated key agreement.

A two party key agreement is simply a series of steps needed to be performed by two entities so that they can agree a shared secret over an insecure channel (such as the internet). This secret is usually used to as the key for a symmetric encryption algorithm and so may be quite short (160 bits). The aim is that any other entity that has the ability to see all of the messages that pass between the two entities can still not calculate the shared secret.


Wind Energy - An Overview

Click here for details.

Boosting parser performance in specific domains

Parsing is a well researched area in natural language processing. To date many high performance probabilistic parsers have been developed. These parsers are trained on tree bank resources and suffer a drop in performance when confronted with text types different to the training data. Through a number of  experiments I have shown that the parsers suffer an even greater drop in performance when tested on particular sentence types, in this case questions. I will present some of my work on improving parser performance on direct questions, and an overview of a method to semi automatically create training resources to boost parser performance in particular areas using direct questions as an example.

Space-requirement Implications for Heterogeneous Traffic Manoeuvres

A prototype micro-simulation model is presented for heterogeneous motorised traffic in an urban context. The heterogeneous mix traffic consists of short (single-unit length) vehicles and long (double-unit length) vehicles in the first instance. Vehicle manoeuvrability at an urban single-lane un-signalised intersection is considered for different traffic distribution and a range of arrival rates, using minimum acceptable-space rule. For long vehicles, this implies occupation of multiple road cells. The impact on throughput (the number of vehicles, which navigate through the intersection in a given time) and capacity, (the number of vehicles passing from an entrance road on to the intersection per unit time) for the geometry are considered. The impact of overall throughput and capacity in a TWSC (Two-way Stop Control) intersection is found to depend on arrival rate of long vehicles as well as the arrival rates of major-roads. The occupation, by a single vehicle, of more than one cell, not only affects these quantities directly, but also controls off-arterial flow.

Modern Foreign Languages and ICT in the Primary School Environment

My current research involves the development of an Intelligent Computer-Assisted Language Learning tool to help children in the primary school environment learn German. This presentation sketches the framework around my research, looking at Modern Foreign Languages (MFL) in primary schools in Europe and Ireland, as well as the Irish Department of Education and Science’s Information and Communications Technology (ICT) drive.

After a tremulous start, MFLs have taken root in primary school education in Europe. Comparatively speaking, Ireland’s current Modern Languages in Primary Schools Initiative is in its infancy, and consideration must be given to the lessons that can be learned from our European counterparts.

The integration of ICT into all sectors of education has been a Department of Education and Science priority since 1997. Current ICT policies complement the basic methodologies within existing language curricula theory, making a successful melding of ICT with languages possible.

Proposed Estimation Method for Financial Auditing

Auditors often sample account balances to determine the accuracy of the published financial statements.  The auditor typically tests the hypothesis that the total error amount in the population exceeds a certain critical level called the materiality amount.  The test is usually conducted by constructing an upper confidence bound for the total error amount.  If this upper bound equals or exceeds the materiality amount, further investigation of the accounts is warranted.  Early researchers first attempted to construct an upper confidence bound for the total error amount in an auditing population using classical procedures.  These methods proved unsuccessful, however, because of the skewed nature of the population and the rare incidence of occurrence of the study variable; the confidence intervals for the means and totals were found to be unreliable, i.e. the coverages were below nominal levels.

The Stringer (1963) method is a non-classical heuristic procedure, based on the Poisson distribution, for computing an upper bound on the total error in a population.  All simulations done by various researchers show the Stringer bound to give coverage above nominal levels.  Furthermore all known simulations studies find the bound to be too conservative with its value being far in excess of the true population error amount.  Despite its conservativeness, the Stringer bound is perhaps the best known in auditing and has been used extensively in the literature as a standard of comparison for other methods.

In this talk I will show how a modification to the Stringer method results in a considerably reduction in its conservatism.

Computer Poker and Opponent Modelling

Poker is the classic game of asymmetric information where the players each have different parts of the relevant information. This adds an extra dimension to the game not present in games of perfect information; it is important to use previous decisions by opponents to deduce the hidden information.

Finding classic equilibria solutions (Minimax/Nash) for the popular variants is a non-trivial problem. Of more practical interest is modelling the play of typical opponents and exploiting this model to increase the overall expected return. The techniques of re-inforcement learning would seem to be natural here as there exists a clear reward signal (i.e. chips won/lost). Good feature selection will be vital, as it so often is.


Automated Tutoring for a Database Skills Training Environment

Universities are increasingly offering courses online. Feedback, assessment and guidance are important features of this online courseware. Together, in the absence of a human tutor, they aid the student in the learning process.

I will present a programming training environment for a database course. It aims to offer a substitute for classroom based learning by providing synchronous automated feedback to the student, along with guidance based on a personalised assessment. The automated tutoring system should promote procedural knowledge acquisition and skills training.

Analysis and Synthesis of Speech based on Nonlinear Dimensionality Reduction

Many problems in pattern recognition begin with the dimensionality reduction of raw high dimensional signals, for example speech waveforms or images of faces. Classical forms of dimensionality reduction, such as PCA and MDS, are limited by the assumption that the data lies in a linear subspace. Recently a number of nonlinear dimensionality reduction techniques have been proposed, including locally linear embedding (LLE) and isometric feature mapping (ISOMAP), which overcome this limitation.

My current research involves investigating the application of these nonlinear dimensionality reduction techniques to speech. If the acoustic variability of a speech data set can be described by a small number of features then we can view the data as lying on a low dimensional manifold in the high dimensional space of speech waveforms. In this talk I will describe the LLE algorithm and how it can be applied to discover low dimensional embeddings of speech data. I shall also discuss our approach to learning a nonlinear mapping from the low dimensional embedding space to the high dimensional ‘speech space’ with a view to speech synthesis and speaker transformation.

Genomic Information Retrieval Using Links

Nowadays a significant amount of biologists' work is spent outside of the wet labs searching genomic databases. Such searches can be tedious and involve navigation through a network of heterogeneous data, following links in a similar way to that in which we browse the internet. The goal of this project is to use the links to enable search capabilities over heterogeneous genomic databases, much as Google does for the web. This approach will reduce the amount of navigation necessary and deliver a response that integrates diverse genomic data.

The talk will describe the heterogeneous genomic database environment, the different types of data involved and the different types of links. Preliminary results on experiments using MeSH (Medical Subject Headings) data will be presented, along with future research directions.

Terabyte Search: Large Scale Web Search Experiments for TREC 2004

As the size of the web increases, the task of developing a effective search engine to deal with these large amounts of documents becomes a major task. The presence of a new Terabyte track in TREC 2004 is just one example of how important large-scale retrieval has become.

In the seminar I will talk about my work since the development of a large-scale search engine. This will include a description of my group's participation in TREC, and subsequent experiments and improvements made to the engine. I will discuss my current research into the modification of the search engines architecture as well as the evaluation of alternative indexes.
Ontology-based Adaptive Content Navigation

Ontologies are a key component of the Semantic Web initiative. Ontology frameworks represent a method for modelling knowledge. I aim to show how ontologies can be used in automated course creation and how adaptivity can be exploited by using an ontological knowledge base. I also hope to demonstrate how a learner can gain knowledge by browsing the ontology in a pedagogically sound manner, using learning resources associated with concepts within the ontology.

Imitation Learning in Interactive Computer Games

Despite a history of games-based research, academia has generally regarded commercial games as a distraction from the serious business of AI, rather than as an opportunity to leverage this existing domain to the advancement of our knowledge. Similarly, the computer game industry still relies on techniques that were developed several decades ago, and has shown little interest in adopting more progressive academic approaches. In recent times, however, these attitudes have changed, as each side begins to recognise the potential offered by the other; under- and post-graduate games development courses are increasingly common, while the industry itself is turning slowly but surely towards the more modern AI fields of machine learning and pattern recognition. One area which has not yet received much attention is imitation learning, a subdiscipline of pattern recognition which seeks to expedite the learning process by exploiting data harvested from demonstrations of a given task. While substantial work has been done in developing imitation techniques for humanoid robots, there has been comparatively little exploration of the challenges posed by interactive computer games. Given that such games generally encode behaviours which are far more complex and interesting than simple limb movement, that they often provide inbuilt facilities for recording human play, that the generation and collection of data is therefore far easier than in robotics, and that many games have vast pre-existing libraries of these recorded demonstrations, it is fair to say that computer games represent an extremely fertile domain for imitation learning research.

This talk will present an overview of imitation learning, including a breakdown of in-game behaviour based closely on an existing and widely-accepted psychological model. An API based around Quake 2, a well-known commercial game of the so-called "first person shooter" genre, will also be presented. This API, known as the Quake Agent Simulation Environment, is currently in joint development along with Blekinge Institute, Sweden and Kyushu University of Japan. QASE is capable of parsing data recorded during human gameplay, of providing an interface between MatLab and the Quake 2 game server, and of realising artificial agents which reproduce observed human behaviours. Some in-game examples of such agents will also be shown.

Recent breakthroughs against SHA-1

Last week, the cryptographic "guru" Bruce Schneier announced a new and effective series of attacks against the Secure Hash Algorithm by a research team based in China and the US. In this talk, we give a basic overview of the Secure Hash Algorithm, sketch the (as yet limited) details of the attack, and speculate on the implications for the computing industry at large.

Hybrid SMT: Robust Sub-Sentential Alignment of Phrase-Structure Trees


Data-Oriented Translation (DOT), based on Data-Oriented Parsing is a language-independent machine translation engine which exploits parsed, aligned bitexts to produce very high-quality translations. However, data acquisition constitutes a serious bottleneck as DOT requires parsed sentences aligned at both sentential and sub-structural levels. Manual sub-structural alignment is a time consuming and error prone task, requiring knowledge of both the source and target languages as well as how they are related.

My research focuses on the automation of this aligment process which is essential in order to carry out the large-scale translation experiments necessary to assess the full potential of DOT.

Long-Term Memory in the Irish Market (ISEQ): Additional Insight from Wavelet Transforms.

 

<>Researchers have used many different methods to detect the possibility of long-term dependence in stock market returns and, generally, there is mixed evidence for the presence of long memory in these data. Here, three different tests, (namely Rescaled Range (R/S), its modified form, and (GPH), in addition to a new approach using the discrete wavelet transform, (DWT), have been applied to the daily returns of five Irish Stock Exchange (ISEQ) indices. These methods have also been applied to the volatility measures (namely absolute and squared returns). The aim is to investigate the existence of long-term memory properties.  The indices are Overall, Financial, General, Small Cap and ITEQ and the results of these approaches show that there is no evidence of long-range dependence in the returns themselves, but there is strong evidence for such dependence in the squared and absolute returns. Moreover, the discrete wavelet transform (DWT) has the additional advantage of providing an in-depth view of the data sets and this gives us a real indication of structure in long memory effects e.g. giving clear picture of the movements in the series. 
Monte Carlo and Cellular Automata methods for Simulation of Drug Dissolution

The objective of this investigation is to use Direct Monte Carlo techniques in simulating drug delivery from compacts of complex composition, taking into consideration the special features of the dissolution in vitro environment. This research  focuses on simulating a binary system, consisting of poorly-soluble drug, dispersed in a matrix of highly-soluble acid excipient. At dissolution, the acid excipient develops certain mechanisms, based on local pH modifications of the medium, which strongly influence drug release. Our model directly accounts for such effects as local interactions of the dissolving components, development of wall-roughness at the solid-liquid interface, moving concentration boundary layer and mass transport by advection. Results qualitalively agree with experimental data and have demonstrated that when modelling dissolution in vitro, special attention must be paid to including the particular conditions of the dissolution environment.

Computer-Assisted Language Learning (CALL) for Dyslexic Learners

Dyslexia is a Specific Learning Difficulty (SLD). It is a deficit in the processing of phonological information and manifests itself as a difficulty in reading, writing and spelling. Approximately 10% of the population have dyslexia, with 4% of the population being severely dyslexic.

My presentation provides a background to dyslexia and special education issues in Ireland and outlines some of the Information and Communication Technology (ICT) tools that are beneficial to dyslexic people.

I will then discuss my research into the aspects of CALL that cater to dyslexic needs, with reference to CALL courseware for dyslexic teenagers that I am developing.


 Poitin: Distilling Theorems from Programs

In this research work, the unfold/fold based transformation technique distillation, which is an extension of the supercompilation technique is used to prove inductive theorems. Generalisation is performed for the termination of both the supercompilation and distillation techniques, but less generalisation is performed for distillation, thus allowing more theorems to be proved. This more powerful distillation technique can prove theorems fully automatically which would otherwise require intermediate lemmas, and can therefore prove a vast range of theorems which cause problems for existing theorem provers.

Web Service Technologies and Real World Applications

Web Services are a collection of technologies that realise heterogeneous interoperability by utilising an open and standardised set of protocols for data exchange. Although there is much talk about Web Services and their associated technologies there are few examples of Web Services in action in industry. In this talk I will attempt to address this issue by providing some interesting examples of real world applications of Web Services. I will also motivate the need for more complex Web Services infrastructures to support enterprise applications.

Some useful background resources related to the talk are available at...
http://en.wikipedia.org/wiki/Web_services
http://en.wikipedia.org/wiki/Service-oriented_architecture
http://en.wikipedia.org/wiki/BPEL

TransBooster: piece-wise machine translation

While state-of-the-art machine translation systems seem to deal well with short sentences and phrases, introducing longer sentences often leads to errors. The purpose of the current project (TransBooster) is to break down sentences and then recompose their translations in such a way that critical syntactic context is not omitted, but the chunks that are submitted for translation are of manageable length for the MT system. Issues involve finding the best substitution schema for replacing complex syntactic elements with dummy variables, singling out the syntactic elements that may be omitted from translation and identifying which parts of the MT output correspond to which chunks of the input. We have been evaluating TransBooster in conjunction with the LogoMedia MT engine.

Project Management Failure - Does a practical solution exist ?

Industry statistics suggest that there is a high incidence of project management failure globally with an associated cost of failure that is equally high but largely unrecognised or ignored by organisations. This study investigates if it is possible to define a solution which can reduce the incidence of project management failure for an organisation and is also relatively effortless in its application.The solution proposed consists of two components -
  1. The project management methodology type
  2. The training approach
Both of these components have been implemented in a large multi-national test organisation and their influence on the outcome of a test group of projects will be measured over a 3-9 month period. It is hoped that the outcome will be positive for all or some of these test group of projects and that some correlation will be established between the outcome and the project management methodology and training approach applied.
Transbooster: boosting the performance of existing MT by complex sentence reduction.

This presentation is an extension of last week's seminar on Transbooster. The goal of Transbooster is to improve the quality of current Machine Translation output, not by proposing redevelopment from scratch, but by building on the strengths of existing MT engines while trying to correct their common shortcomings. The motivation for the project lies in the fact that many existing wide-coverage translation systems handle simple or short sentences better than linguistically complex ones. The input to the algorithm that we have developed to reduce the complexity for the client Machine Translation system, will eventually have to be produced by a parser if previously unseen data are to be processed. In this presentation, I will give an overview of parsing techniques that could be used and sum up the strenghts/weaknesses of our approach.
Automatic Summarisation of Digitial Visoe based on Analysis of Musical Score

There are thousands of movies available to the viewing public, but the question is how do we choose among them when deciding what to watch? Do we yield to the marketing strategies of the studios, listen to the critics’ reviews, or follow viewer recommendations? In our opinion, it would be better to be able to access a summary of a film based what happens in it and how it’s intended to make you feel, and then we can make up our own minds. To that end, we're working on an approach to automatic summarisation of movies, based primarily on an analysis of its musical score as an indicator of content - automatically analysing movie content, with a view to providing content description.

Level-based Indexing for Optimising XML Queries

Many of the problems with native XML databases relate to query performance and subsequently, it can be difficult to convince traditional database users of the benefits of using semi- or unstructured databases. In particular, the ongoing development of the XQuery language requires that performance related issues are resolved. Presently, there still lacks an index structure providing efficient support for both navigational and structural queries and the traditional data-centric and content queries. This paper presents an extended index structure based on the preorder traversal rank and the level (or depth) rank of each node in a document tree. The extended index fully supports the navigation of all XPath axes while efficiently supporting data-centric queries. The ability to start path traversals from arbitrary nodes in a document tree also enables the extended index to support the evaluation of path traversals embedded in XQuery expressions. Furthermore, an encoding technique for this extended index structure is presented, where properties of a level ranking may be exploited to provide efficient and optimised path traversals and in certain cases, optimal solutions to path traversals.

Consistency management mechanisms in Internet content delivery

Facilitating the exchange of data between organisations located at geographically distributed sites was the main motivation for the development of the Internet. Today, as a consequence of the popularity of the world-wide-web, and as the E-business revolution gathers momentum, new methods for the distribution of content from origin servers to end-users are required. In the coming years, data will be highly replicated, and processed deep inside the network, introducing a whole host of consistency management issues. In this presentation, a variety of content delivery architectures, from the traditional client-server model, to the futuristic on-demand/edge computing platform will be illustrated. The issue of consistency management will be explored, followed by a description of current techniques. Details of my own work thus far in the area will be given, together with an outline of future research direction.

A Metadata Repository for Facilitating Process Composition

Service oriented architectures provide a modern paradigm for web services allowing seamless interoperation among network applications and supporting a flexible approach to building large complex information systems. A number of industrial standards have emerged to exploit this paradigm with the development of the J2EE and .NET infrastructure platforms, communication protocol SOAP, description language WSDL and orchestration languages BPEL, XLANG and WSCI. At the same time the Semantic Web enables automated use of ontologies to describe web services in a machine interpretable language. In previous work, we presented a Peer-to-Peer infrastructure for large scale data integration. In this presentation a service infrastructure to provide an e-business layer exploiting current web service technologies will be displayed.  In this context, we present a distributed service repository over a super-peer network facilitating process composition. To provide tangible reliability for services and processes,  our framework is introduced to support the e-business layer.

Finding New News: Novelty Detection in Broadcast News

The automatic detection of novelty, or newness, as part of an information retrieval system would greatly improve a searcher's experience by presenting "documents" in order of how much extra information they add to what is already known instead of how similar they are to a user's query. This would be particularly useful in applications such as searching broadcast news. We present a novelty detection system evaluated on the AQUAINT text collection as part of our TREC 2004 Novelty Track experiments. We also discuss how we are extending the text-only approach to novelty detection to also include input from video analysis.


Side Channel Analysis of Pairing-Based Cryptography

Pairing-based cryptography (PBC) is currently one of the most popular topics in cryptography. Pairings such as the Weil, Tate or Eta, have the essential property of bilinearity, which is the core operation in the construction of a number of ECC based protocols. With the development of efficient algorithms to compute Pairings, makes them and their respective protocols, perfect candidates for usage on smartcard systems.

Implementation of these primitives on smartcards cannot pass without the assessment of vulnerability of Pairings to SCA, since it is one of the most potent forms of cryptanalysis of smartcard systems.

In this talk an overview of Pairing-based cryptography will be presented and a first look at the application of SCA to PBC will be discussed.

Probabilistic detection of ungrammatical sentences

In theory, parsing directly yields a grammaticality judgement. A sentence is grammatical, if and only if, it can be parsed with a grammar of the language in question. Unfortunately, writing grammars for natural languages is very difficult and nobody has actually succeeded in providing a complete grammar yet. Hand-written grammars usually achieve deficient coverage, i.e. a parser will reject sentences that are judged grammatical by most native speakers. Data-driven methods, on the other hand, often generalise too much and produce grammars that render nearly any sequence of words grammatical. The latter
grammars are still useful in applications that assume valid input and just need an analysis of the input. Probabilistic parsers use statistical information attached to the grammar in order to select a plausible parse
among the set of possible parses. Over-generalisation and probabilistic selection together result in high robustness to errors and broad coverage of  language. These are very desirable properties in many applications. We propose to exploit the output of existing probabilistic parsers to judge grammaticality. I will explain why a simple threshold method cannot work and outline our idea to tackle this problem. Prelimniary experiments show that the main prerequisite of our approach might actually hold.
Distillation: Higher Order Transformation for Higher Order Programs


15 May 2005