Venkatesh Kannan - Transfer Talk - Tuesday 27th May 2014

Video Category: 
Transfer Talk
Venkatesh Kannan

Title: Automated Parallelisation of Functional Programs for Heterogeneous Architectures


The heterogeneous architecture of computing systems is composed of a variety of parallel processing units, most commonly multi-core CPUs and many-core GPUs. To effectively utilise the parallel computing power in hardware, program execution needs to be parallelised. As program parallelisation is challenging and error-prone in practice, it needs to be automated for both efficiency and accuracy.

Over the years, many program transformation methods have been proposed to systematically derive parallel programs. However, they still require some manual input during the parallelisation process. Other techniques that automate the parallelisation of programs address a single type of processing unit. This leaves room for the need to automate program parallelisation targeting heterogeneous processing units that are omnipresent in computing systems today.

In this transfer talk, we present the status of our research to automate the parallelisation of a given functional program for its efficient execution on heterogeneous hardware with multi-core CPUs and GPUs. We present our work in three segments: (1) the identification of parallel computations in an arbitrary program, (2) the efficient execution of the identified parallel computations on CPUs and GPUs, and (3) the evaluation of our parallelisation approach.

To identify potential parallelism, we begin by characterising parallel computations that may occur in a program, and represent them using skeletons that are widely used for parallel program development. Following this, we use a program transformation technique called distillation to aid identifying possible usesĀ of skeletons to implement the program. Based on the characteristics of the identified skeleton uses, they will be scheduled for execution on a suitable processing unit. For this, we plan to use the Accelerate library of operations, which generates efficient OpenCL implementations for the chosen skeletons that can be executed on CPUs and GPUs. This would allow the given program’s parallel execution on heterogeneous hardware.

We are presently working on transformation techniques to parallelise the distilled form of a given program. This will be formally specified upon its completion. Following this, we will complete our current solutions to execute the parallel computations on CPUs and GPUs. We will then comprehensively evaluate our approach through qualitative and quantitative analysis of our parallelisation techniques and their results.