Implementation of non local means filter in GPUs

Adrián Márques, Alvaro Pardo

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

13 Scopus citations

Abstract

In this paper, we review some alternatives to reduce the computational complexity of the Non-Local Means image filter and present a CUDA-based implementation of it for GPUs, comparing its performance on different GPUs and with respect to reference CPU implementations. Starting from a naive CUDA implementation, we describe different aspects of CUDA and the algorithm itself that can be leveraged to decrease the execution time. Our GPU implementation achieved speedups of up to 35.8x with respect to our reduced-complexity reference implementation on the CPU, and more than 700x over a plain CPU implementation.

Original languageEnglish
Title of host publicationProgress in Pattern Recognition, Image Analysis, Computer Vision, and Applications - 18th Iberoamerican Congress, CIARP 2013, Proceedings
Pages407-414
Number of pages8
EditionPART 1
DOIs
StatePublished - 2013
Event18th Iberoamerican Congress on Pattern Recognition, CIARP 2013 - Havana, Cuba
Duration: 20 Nov 201323 Nov 2013

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
NumberPART 1
Volume8258 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference18th Iberoamerican Congress on Pattern Recognition, CIARP 2013
Country/TerritoryCuba
CityHavana
Period20/11/1323/11/13

Keywords

  • CUDA
  • GPU
  • Image denoising
  • Non-local means

Fingerprint

Dive into the research topics of 'Implementation of non local means filter in GPUs'. Together they form a unique fingerprint.

Cite this