You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
|
7 months ago | |
---|---|---|
.. | ||
benchmark | 7 months ago | |
example | 7 months ago | |
include | 7 months ago | |
test | 7 months ago | |
CMakeLists.txt | 7 months ago | |
CMakePresets.json | 11 months ago | |
README.md | 11 months ago | |
vcpkg.json | 11 months ago |
README.md
Reduction Case Study
Reduction is a common algorithmic operation used in parallel programming to reduce an array of elements into a shorter array of elements or a single value. This document exploits reduction to introduce some key considerations while designing and optimizing GPU algorithms.
This repository hosts the sample code used in the HIP documentation.
Structure
The coding style and the directory structure follow mostly that of rocPRIM, differing in a few ways:
- Unbound by the C++14 requirement of rocPRIM dictated by hipCUB and rocThrust, this repository uses C++20 as the baseline.
- As such, implementations are free to make use of some TMP/constexpr helper
functions found within
include/tmp_utils.hpp
. - The tests and benchmarks don't initialize resources multiple times, but do so just once and reuse the same input for tests/benchmarks of various sizes.
- Neither do tests, nor the benchmarks use prefixes for input initialization. Instead they both create a function object storing all states which tests capture by reference.
- "Diffing" the various implementations in succession reveals the minor changes
between each version.
v0.hpp
is a simple Parallel STL implementation which is used for verification and a baseline of performance for comparison. - The
example
folder holds the initial implementations of the various optimization levels of the benchmarks.