General Information

My name is Martin O'Connor. I am currently pursuing a Ph.D., in the area of dynamic labeling schemes for XML, with a particular emphasis on supporting XML updates.

I am a member of the Interoperable Systems Group (ISG) and my supervisor is Dr. Mark Roantree. The ISG research team is based in the School of Computing located in Dublin City University, Ireland.

 

 

Phd Topic:  Dynamic Labeling Schemes for XML Updates

Overview

The eXtensible Markup Language (XML) has been adopted as the new standard for data exchange on the World Wide Web. More and more information from the financial, legal, health, scientific and government domains is been processed, integrated, queried, transformed and published as XML. As the rate of XML adoption increases, there is an ever pressing need to store and maintain the information in XML format and thereby eliminating the overhead of parsing and transforming XML in and out of various data formats. At present, most modern databases provide support for storing and querying XML documents. They also support the updating of XML data at the document level, but provide limited and inefficient support for the more fine-grained (node-based) updates within XML documents. In my doctoral research, I have focused on investigating the essential problems prohibiting efficient, effective and holistic XML updates with the goal of designing solutions to overcome these problems.

Background

The ability to support XML updates is a difficult and challenging problem because the basic structure (data model) underlying an XML document is a tree. The traditional (relational) database has a structure similar to a spreadsheet (a table with rows and columns) and the provision of table-based updates are a well-understood problem. However, updating a tree-based structure is much more challenging. There are four key differences between the simple table-based model and the more complex XML (tree-based) model.

  • Hierarchical: A table has a flat representation; a tree has a hierarchical representation.
  • Order: The order of the rows in a database table do not matter (they have a unique id or primary key), the order of branches (nodes) and leaves in a tree is important.
  • Semi-structured: A table has a fixed structure (rows multiplied by columns); a tree has a flexible structure (an arbitrary combination of branches and leaves).
  • Meta-data: The information describing the meaning of the data in a table is stored separately from the table; In XML, the meaning of the data is stored in tags (markup) with the data itself inside the XML document.

The differences that set the XML data model apart from the table-based model present many unique challenges for the end-to-end management of XML documents, including storage, indexing, query planning, query processing, query optimization and result serialization. The provision of XML updates must facilitate all of these services in addition to supporting arbitrary node insertions and deletions within an XML document, and guaranteeing unique node identify and node order at all times. Indeed, the provision of efficient XML updates is still an open research problem. As the volume of XML data increases, and XML databases and XML repositories become more mainstream, the ability to support node-based XML updates efficiently becomes paramount.

Motivation

In my research, I analyzed several of the problems and bottlenecks in the provision of an XML update service and determined they were in fact symptoms of an underlying root cause – an inappropriate tree labeling scheme used in the encoding of the XML documents. Almost all existing research into dynamic labeling schemes for XML was developed in isolation. Each tree labeling scheme was designed to solve a particular problem or provide a particular feature, often without taking into account the impact their solution may have on other key features. To-date there has been no attempt to identify what exactly are the core properties (that are representative of the desirable characteristics) of a good and holistic dynamic labeling scheme for XML.

Contributions (to-date):

  • I performed a thorough analysis of all existing XML encoding and labeling schemes.
  • I identified a template of core properties representative of the desirable characteristics of a good dynamic labeling scheme (DLS) for XML.
  • I constructed a framework of metrics to permit an objective evaluation of all dynamic labeling schemes for XML.
  • I focused on three key properties central to the outstanding problems in existing DLS – compact encoding, label reuse and scalability. At present, there is no DLS that integrates support for all three key properties.
  • I designed a new DLS (named EBSL) to guarantee full and complete support for the reuse of deleted node labels. EBSL supports a deleted label reuse strategy that best suits the nature of node deletions and insertion in XML.
  • I designed a new compact, deterministic and   scalable mathematical encoding mechanism to support an infinite number of label insertions and deletions.
  • I designed a new DLS for XML (named SCOOTER) that integrates all three key features: compact encoding, deleted node label reuse and scalability. Indeed the DLS supports all of the core properties identified in our template of properties.

 

 

 

 

Other Research Interests:

The following are a list of some of the related areas in which I have an active research interest.

  • Data Management and Query Processing.
  • XML Storage and Indexing techniqies.
  • Query planning and Query Optimization.
  • XML Updates and Dynamic Labeling Schemes.
  • Novel Algorithms and Data Structures.
  • Number Theory.
  • Mathematical Encoding Schemes.
  • Sensor Data Managment and Data Integration.
  • Data Streaming Query Processing and Optimization.
  • Query Processing in P2P systems.
  • XPath, XQuery and all XML related technologies.
  • Relational and Object-Relational Databases.
  • Schema evolution and Schema Matching.

 

 

 

 

Journal Publications

  • Data Transformation and Query Management in Personal Health Sensor Networks. PDF
    Mark Roantree, Jie Shi, Paolo Cappellari, Martin F. O'Connor, Michael Whelan and Niall Moyna.
    Journal of Network and Computer Applications, Vol. 35, Issue. 4, ISSN 1084-8045, pp.1191-1202, July 2012.

Peer-Reviewed Publications

  • SCOOTER: A Compact and Scalable Dynamic Labeling Scheme for XML Updates.
    Martin F. O'Connor and Mark Roantree.
    To appear in: The 23rd International Conference on Database and Expert Systems Applications, (DEXA 2012), Vienna, Austria, 3-7 September 2012, LNCS Springer Proceedings.

  • EBSL: Supporting Deleted Node Label Reuse in XML. PDF
    Martin F. O'Connor and Mark Roantree.
    The 7th International XML Database Symposium, (XSym 2010), Singapore, 17 September 2010, LNCS 6309, pp.73-87, Springer Proceedings.

  • Desirable Properties for XML Update Mechanisms. PDF
    Martin F. O'Connor and Mark Roantree.
    Updates in XML: Proceedings of the EDBT/ICDT 2010 Workshops, Lausanne, Switzerland, 22-26 March 2010, Vol. 426, Article No.: 23, ACM Proceedings.

  • Querying XML Data Streams from Wireless Sensor Networks: An Evaluation of Query Engines. PDF
    Martin F. O'Connor, Kenneth Conroy, Mark Roantree, Alan F. Smeaton and Niall M. Moyna.
    The 3rd International Conference on Research Challenges in Information Science (RCIS 2009), Fes, Morocco, 22-24 April 2009, pp.23-30, IEEE Proceedings.

  • Query Management in a Sensor Environment. PDF
    Martin F. O'Connor, Vincent Andrieu and Mark Roantree.
    The 14th International Conference on Parallel and Distributed Systems (ICPADS 2008), Melbourne, Australia, 8-10 December 2008, pp.835-840, IEEE Proceedings.

  • An Extended Preorder Index for Optimising XPath Expressions. PDF
    Martin F. O'Connor, Zohra Bellahsene, and Mark Roantree.
    The 3rd International XML Database Symposium (XSym 2005), Trondheim, Norway, 28-29 August 2005, LNCS 3671, pp.114-128, Springer Proceedings.

  • A list of all ISG publications may be found on the ISG publication Page.

Master's Thesis (By Research)

  • Level-based Indexing for Optimising XML Queries. PDF
    Martin Francis O'Connor. Master of Science thesis (Research), Dublin City University, June 2005.