You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
 
 
 
Beatriz Navidad Vilches f8cb857491 Fixed CMake linting 7 months ago
..
benchmark Fixed CMake linting 7 months ago
example Fixed CMake linting 7 months ago
include fix: a __global__ function cannot be a member function 7 months ago
test Fixed CMake linting 7 months ago
CMakeLists.txt Fixed CMake linting 7 months ago
CMakePresets.json Add reduction tutorial (#128) 11 months ago
README.md Add reduction tutorial (#128) 11 months ago
vcpkg.json Add reduction tutorial (#128) 11 months ago

README.md

Reduction Case Study

Reduction is a common algorithmic operation used in parallel programming to reduce an array of elements into a shorter array of elements or a single value. This document exploits reduction to introduce some key considerations while designing and optimizing GPU algorithms.

This repository hosts the sample code used in the HIP documentation.

Structure

The coding style and the directory structure follow mostly that of rocPRIM, differing in a few ways:

  • Unbound by the C++14 requirement of rocPRIM dictated by hipCUB and rocThrust, this repository uses C++20 as the baseline.
  • As such, implementations are free to make use of some TMP/constexpr helper functions found within include/tmp_utils.hpp.
  • The tests and benchmarks don't initialize resources multiple times, but do so just once and reuse the same input for tests/benchmarks of various sizes.
  • Neither do tests, nor the benchmarks use prefixes for input initialization. Instead they both create a function object storing all states which tests capture by reference.
  • "Diffing" the various implementations in succession reveals the minor changes between each version. v0.hpp is a simple Parallel STL implementation which is used for verification and a baseline of performance for comparison.
  • The example folder holds the initial implementations of the various optimization levels of the benchmarks.