Comparison of high performance computing methods for high resolution Radon transform and deblending

Kai Zhuang, Daniel O. Trad

We implemented both a sparse Radon transform and a least-squares deblending algorithm in C++ using different parallel processing methods. In this paper, we compare three different Application Programming Interfaces (APIs) to perform parallel computing on large datasets, using openMP, openMPI, and CUDA. Our goal is to understand the scaling of different parallel processing methods with our codes and explore the advantages and drawbacks of each API. For our comparison, we will be utilizing the sparse Radon transform and least squares deblending as our focus algorithms that will benefit from parallelization. The sparse Radon transform is easily parallelizable as each Radon frame is calculated independently from each other, which results in greatly reduced calculation time proportional to the resources given. On the other hand, the least-squares deblending algorithm is not efficient when implemented with openMPI as the least-squares gradient requires the application of the blending forward and adjoint operators that involve resorting the data from the entire dataset at each iteration. Therefore, the openMPI implementation of least-squares deblending requires collecting all the data in a single main node at every iteration, thus adding a significant overhead because of data transfer that often outweigh the computing performance gain. By implementing the deblending in CUDA on a single local machine, we then significantly reduce data copying overhead and increase the computational speed of the gradient calculation. Most likely a CUDA implementation with distributed GPUs (GPU clusters) would suffer from similar issues as the openMPI version for inversion but is not tested for this report.