Muhammad Shafiq, Miquel Peric` s a Nacho Navarro
Eduard Ayguad´ e Computer Sciences
Dept. Arquitectura de Computadors
Computer Sciences
Barcelona Supercomputing Center
Universitat Polit` cnica de Catalunya Barcelona Supercomputing Center e Barcelona, Spain
Barcelona, Spain
Barcelona, Spain
{muhammad.shafiq, miquel.pericas}@bsc.es nacho@ac.upc.edu eduard.ayguade@bsc.es
Abstract—In the race towards computational efficiency, accelerators are achieving prominence. Among the different types, accelerators built using reconfigurable fabric, such as
FPGAs, have a tremendous potential due to the ability to customize the hardware to the application. However, the lack of a standard design methodology hinders the adoption of such devices and makes difficult the portability and reusability across designs. In addition, generation of highly customized circuits does not integrate nicely with high level synthesis tools.
In this work, we introduce TARCAD, a template architecture to design reconfigurable accelerators. TARCAD enables high customization in the data management and compute engines while retaining a programming model based on generic programming principles. The template features generality and scalable performance over a range of FPGAs. We describe the template architecture in detail and show how to implement five important scientific kernels: MxM, Acoustic Wave Equation,
FFT, SpMV and Smith Waterman. TARCAD is compared with other High Level Synthesis models and is evaluated against GPUs, an architecture that is far less customizable and, therefore, also easier to target from a simple and portable programming model. We analyze the TARCAD template and compare its efficiency on a large Xilinx Virtex-6 device to that of several recent GPU studies.
I. I NTRODUCTION
The integration levels of current FPGA devices advanced to the point where all functions of a complex