Case Study: HPCCG mini-app

This repository demonstrates the Daisytuner Optimizing Compiler Collection (DOCC) applied to the High-Performance Computing Conjugate Gradient (HPCCG) mini-app, showcasing advanced auto-parallelization (OpenMP) and auto-offloading capabilities (Tenstorrent/CUDA). DOCC automatically analyzes computational kernels and generates optimized code for each target. The repository includes Daisy workflows for comprehensive performance analysis through our dashboard.

The case study focuses on optimizing three critical conjugate gradient kernels: SPMV (Sparse Matrix-Vector Multiplication), WAXPBY (vector operations), and DDOT (dot product), which are executed iteratively as part of the solver. Each kernel has been optimized by DOCC to exploit parallelism, optimize memory access patterns, and minimize data transfers for offloading targets with dedicated memory.

To ease analysis, we have made the following changes to the original code:

Existing MPI calls and OpenMP pragmas have been removed.
The matrix data layout has been changed to ELLPACK, which is more suitable for accelerators.
The precision has been downgraded from FP64 to FP32.
Minor code improvements such as removal of unused return types.

Port to Tenstorrent Wormhole and Blackhole

The Tenstorrent Wormhole and Blackhole are PCIe-based AI accelerator cards. Each card contains many Tensix processors, and each Tensix Processor comprises several RISC-V cores, local memory and more specialized functional units.

source1 source2

Every Tensix processor has both a specialized matrix unit and a vector unit to deliver high-throughput computation on blocks of data. The RISC-V cores can do general purpose work and coordinate the hardware units and organize data transfers into and out of memory.

Floating-point precision varies with the specific hardware units:

FP32 on the RISC-V cores with soft-float.
near-IEEE754-compliant FP32 on the vector unit (no subnormals, reduced overflow/underflow).
hybrid, similar to TensorFloat-32 precision on the matrix unit (same divergence from IEEE as vector unit).

Since the conjugate gradient method is dominated by sparse matrix–vector multiplication (SPMV), the goal is to evaluate the accelerator’s capabilities for sparse linear algebra while maintaining adequate numerical stability (convergence of solver / residual). DOCC relies on four main components to port the application:

A hand-tuned SPMV implementation for Tenstorrent using the matrix unit, compiled into a runtime library consumed by the HPCCG application.
Einsum (dot-product) detection and automatic mapping to the vector unit via DOCC.
Detection of data-parallel loops (e.g., waxpby) with naïve code generation targeting the RISC-V cores.
Thin-LTO and linker-based optimizations performed by DOCC to minimize data movement by automatically hoisting transfers outside the HPCCG solver loop.

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
.daisy		.daisy
figures		figures
tenstorrent		tenstorrent
.gitignore		.gitignore
HPCCG.cpp		HPCCG.cpp
HPCCG.hpp		HPCCG.hpp
HPC_Sparse_Matrix.cpp		HPC_Sparse_Matrix.cpp
HPC_Sparse_Matrix.hpp		HPC_Sparse_Matrix.hpp
HPC_sparsemv.cpp		HPC_sparsemv.cpp
HPC_sparsemv.hpp		HPC_sparsemv.hpp
LICENSE		LICENSE
README.md		README.md
YAML_Doc.cpp		YAML_Doc.cpp
YAML_Doc.hpp		YAML_Doc.hpp
YAML_Element.cpp		YAML_Element.cpp
YAML_Element.hpp		YAML_Element.hpp
compute_residual.cpp		compute_residual.cpp
compute_residual.hpp		compute_residual.hpp
ddot.cpp		ddot.cpp
ddot.hpp		ddot.hpp
generate_matrix.cpp		generate_matrix.cpp
generate_matrix.hpp		generate_matrix.hpp
main.cpp		main.cpp
mytimer.cpp		mytimer.cpp
mytimer.hpp		mytimer.hpp
strongScalingRunScript		strongScalingRunScript
waxpby.cpp		waxpby.cpp
waxpby.hpp		waxpby.hpp
weakScalingRunScript		weakScalingRunScript

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Case Study: HPCCG mini-app

Port to Tenstorrent Wormhole and Blackhole

About

Uh oh!

Languages

License

daisytuner/HPCCG

Folders and files

Latest commit

History

Repository files navigation

Case Study: HPCCG mini-app

Port to Tenstorrent Wormhole and Blackhole

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Languages