Transfer Talk - Suzanne McCarthy - 27th June 2018

Video Category: 
Transfer Talk
Suzanne McCarthy

Title: A Lightweight ETL Framework for On Demand Queries

Data warehouses provide strategic advantage for the companies that can afford them. They provide the datasets for data prediction algorithms, decision support systems and online analytical processing that enables an enterprise to automate decision making and predict future patterns for their products, services and in-house requirements. The downside is that data warehouses are costly and difficult to build and maintain. Integrating data from heterogeneous data sources is never easy and the cleaning and reformatting of data between operational databases and the warehouse is time consuming. Thus, the Extract-Transform-Load process which is built to integrate data into the warehouse is very costly to modify as needs evolve. Furthermore, as web data is unstructured, prone to change and costly to validate, any commercial systems which allow the integration of web data with enterprise data rely on supports such as API's, which not all data sites provide.
In this research, we propose a new form of data warehouse construction which adopts a more flexible approach. It breaks from the tradition of a predefined Extract-Transform-Load process with periodic updates to a more dynamic construction of the queryable data marts together with an on-demand refresh process. Apart from the dual benefits of cost and the speed at which new data sources can be added, the inclusion of an on-demand feature ensures that large volumes of redundant data are not processed needlessly.