PyGunrock API Reference#
High-performance GPU graph analytics using pytorch tensors.
Installation#
Install from git repository:
CMAKE_ARGS="-DCMAKE_HIP_ARCHITECTURES=gfx942" pip install git+https://github.com/gunrock/gunrock.git#subdirectory=python
Or install from source:
cd python
CMAKE_ARGS="-DCMAKE_HIP_ARCHITECTURES=gfx942" pip install .
Requirements:
Python >= 3.9
ROCm/HIP (system installation)
CMake >= 3.25
nanobind >= 2.0.0
PyTorch >= 2.0.0 with ROCm support
Quick Start#
import torch
import gunrock
# Load graph from Matrix Market file
mm = gunrock.matrix_market_t()
properties, coo = mm.load("graph.mtx")
# Convert to CSR and build device graph
csr = gunrock.csr_t()
csr.from_coo(coo)
G = gunrock.build_graph(properties, csr)
# Create GPU context
context = gunrock.multi_context_t(0)
# Allocate output tensors on GPU
n = coo.number_of_rows
distances = torch.full((n,), float('inf'), dtype=torch.float32, device='cuda:0')
predecessors = torch.full((n,), -1, dtype=torch.int32, device='cuda:0')
# Run SSSP
elapsed_ms = gunrock.sssp(G, 0, distances, predecessors, context)
context.synchronize()
print(f"SSSP completed in {elapsed_ms:.2f} ms")
print(f"Distances: {distances.cpu()}")
Core Components#
Context Management#
Graph Structures#
Graph Formats#
CSR (Compressed Sparse Row)#
- class gunrock.csr_t#
- class gunrock.csr_t(rows, cols, nnz)
Compressed Sparse Row format.
- Parameters:
rows – Number of rows
cols – Number of columns
nnz – Number of non-zeros
- number_of_rows: int#
Number of rows in the matrix.
- number_of_columns: int#
Number of columns in the matrix.
- number_of_nonzeros: int#
Number of non-zero elements.
- from_coo(coo)#
Convert from COO format to CSR.
- Parameters:
coo (gunrock.coo_t) – COO format matrix
- read_binary(filename)#
Read CSR from binary file.
- Parameters:
filename (str) – Path to binary file
COO (Coordinate Format)#
CSC (Compressed Sparse Column)#
- class gunrock.csc_t#
- class gunrock.csc_t(rows, cols, nnz)
Compressed Sparse Column format.
- Parameters:
rows – Number of rows
cols – Number of columns
nnz – Number of non-zeros
- number_of_rows: int#
- number_of_columns: int#
- number_of_nonzeros: int#
- gunrock.build_graph(properties, csr)#
Build a graph on GPU device from CSR format.
- Parameters:
properties (gunrock.graph_properties_t) – Graph properties
csr (gunrock.csr_t) – CSR format matrix
- Returns:
Graph object on device
- Return type:
Algorithms#
SSSP (Single-Source Shortest Path)#
- gunrock.sssp(graph, source, distances, predecessors, context=None, options=None)#
Run Single-Source Shortest Path algorithm with PyTorch tensors.
- Parameters:
graph (gunrock.graph_t) – Input graph
source (int) – Source vertex ID
distances (torch.Tensor) – Output distance tensor (float32, on GPU)
predecessors (torch.Tensor) – Output predecessor tensor (int32, on GPU)
context (gunrock.multi_context_t) – GPU context (optional, default: device 0)
options (gunrock.options_t) – Algorithm options (optional)
- Returns:
Elapsed time in milliseconds
- Return type:
float
Example:
import torch import gunrock # ... load graph and build G ... context = gunrock.multi_context_t(0) n = G.get_number_of_vertices() distances = torch.full((n,), float('inf'), dtype=torch.float32, device='cuda:0') predecessors = torch.full((n,), -1, dtype=torch.int32, device='cuda:0') elapsed = gunrock.sssp(G, 0, distances, predecessors, context) context.synchronize() # Use results in PyTorch operations reachable = torch.isfinite(distances) print(f"Reachable vertices: {reachable.sum().item()}")
- class gunrock.sssp_param_t(single_source, options=None)#
Low-level SSSP algorithm parameters (for advanced use).
- Parameters:
single_source (int) – Source vertex ID
options (gunrock.options_t) – Algorithm options (optional)
BFS (Breadth-First Search)#
- gunrock.bfs(graph, source, distances, predecessors, context=None, options=None)#
Run Breadth-First Search algorithm with PyTorch tensors.
- Parameters:
graph (gunrock.graph_t) – Input graph
source (int) – Source vertex ID
distances (torch.Tensor) – Output distance tensor (int32, on GPU)
predecessors (torch.Tensor) – Output predecessor tensor (int32, on GPU)
context (gunrock.multi_context_t) – GPU context (optional, default: device 0)
options (gunrock.options_t) – Algorithm options (optional)
- Returns:
Elapsed time in milliseconds
- Return type:
float
Example:
import torch import gunrock # ... load graph and build G ... context = gunrock.multi_context_t(0) n = G.get_number_of_vertices() distances = torch.full((n,), -1, dtype=torch.int32, device='cuda:0') predecessors = torch.full((n,), -1, dtype=torch.int32, device='cuda:0') elapsed = gunrock.bfs(G, 0, distances, predecessors, context) context.synchronize() # Use results in PyTorch operations visited = distances >= 0 print(f"Visited vertices: {visited.sum().item()}")
- class gunrock.bfs_param_t(single_source, options=None)#
Low-level BFS algorithm parameters (for advanced use).
- Parameters:
single_source (int) – Source vertex ID
options (gunrock.options_t) – Algorithm options (optional)
BC (Betweenness Centrality)#
- class gunrock.bc_param_t(single_source, options=None)#
BC algorithm parameters.
- Parameters:
single_source (int) – Source vertex ID
options (gunrock.options_t) – Algorithm options (optional)
- class gunrock.bc_result_t(bc_values)#
BC algorithm results.
- Parameters:
bc_values – Output betweenness centrality values (float32 pointer)
- gunrock.bc_run(graph, param, result, context=None)#
Run Betweenness Centrality algorithm.
- Parameters:
graph (gunrock.graph_t) – Input graph
param (gunrock.bc_param_t) – Algorithm parameters
result (gunrock.bc_result_t) – Result structure
context (gunrock.multi_context_t) – GPU context (optional)
- Returns:
Elapsed time in milliseconds
- Return type:
float
PR (PageRank)#
- class gunrock.pr_param_t(alpha=0.85, tol=1e-06, options=None)#
PageRank algorithm parameters.
- Parameters:
alpha (float) – Damping factor (default: 0.85)
tol (float) – Convergence tolerance (default: 1e-6)
options (gunrock.options_t) – Algorithm options (optional)
- class gunrock.pr_result_t(p)#
PageRank algorithm results.
- Parameters:
p – Output PageRank values (float32 pointer)
- gunrock.pr_run(graph, param, result, context=None)#
Run PageRank algorithm.
- Parameters:
graph (gunrock.graph_t) – Input graph
param (gunrock.pr_param_t) – Algorithm parameters
result (gunrock.pr_result_t) – Result structure
context (gunrock.multi_context_t) – GPU context (optional)
- Returns:
Elapsed time in milliseconds
- Return type:
float
PPR (Personalized PageRank)#
- class gunrock.ppr_param_t(seed, alpha=0.85, epsilon=1e-06, options=None)#
Personalized PageRank algorithm parameters.
- Parameters:
seed (int) – Source vertex ID (seed vertex for personalized PageRank)
alpha (float) – Damping factor (default: 0.85)
epsilon (float) – Convergence tolerance (default: 1e-6)
options (gunrock.options_t) – Algorithm options (optional)
- class gunrock.ppr_result_t(p)#
PPR algorithm results.
- Parameters:
p – Output PPR values (float32 pointer)
- gunrock.ppr_run(graph, param, result, context=None)#
Run Personalized PageRank algorithm.
- Parameters:
graph (gunrock.graph_t) – Input graph
param (gunrock.ppr_param_t) – Algorithm parameters
result (gunrock.ppr_result_t) – Result structure
context (gunrock.multi_context_t) – GPU context (optional)
- Returns:
Elapsed time in milliseconds
- Return type:
float
TC (Triangle Counting)#
- class gunrock.tc_param_t(reduce_all_triangles=False, options=None)#
Triangle Counting algorithm parameters.
- Parameters:
reduce_all_triangles (bool) – Whether to reduce all triangles to a single count (default: False)
options (gunrock.options_t) – Algorithm options (optional)
- class gunrock.tc_result_t(vertex_triangles_count, total_triangles_count)#
TC algorithm results.
- Parameters:
vertex_triangles_count – Output per-vertex triangle counts (int32 pointer)
total_triangles_count – Output total triangle count (uint64 pointer)
- gunrock.tc_run(graph, param, result, context=None)#
Run Triangle Counting algorithm.
- Parameters:
graph (gunrock.graph_t) – Input graph
param (gunrock.tc_param_t) – Algorithm parameters
result (gunrock.tc_result_t) – Result structure
context (gunrock.multi_context_t) – GPU context (optional)
- Returns:
Elapsed time in milliseconds
- Return type:
float
Color (Graph Coloring)#
- class gunrock.color_param_t(options=None)#
Graph Coloring algorithm parameters.
- Parameters:
options (gunrock.options_t) – Algorithm options (optional)
- class gunrock.color_result_t(colors)#
Graph Coloring algorithm results.
- Parameters:
colors – Output color assignments (int32 pointer)
- gunrock.color_run(graph, param, result, context=None)#
Run Graph Coloring algorithm.
- Parameters:
graph (gunrock.graph_t) – Input graph
param (gunrock.color_param_t) – Algorithm parameters
result (gunrock.color_result_t) – Result structure
context (gunrock.multi_context_t) – GPU context (optional)
- Returns:
Elapsed time in milliseconds
- Return type:
float
Additional Algorithms#
PyGunrock also includes low-level bindings for:
Geo (Graph Embedding):
geo_param_t,geo_result_t,geo_runHITS (Hyperlink-Induced Topic Search):
hits_param_t(result and run not yet exposed)K-Core:
kcore_param_t,kcore_result_t,kcore_runMST (Minimum Spanning Tree):
mst_param_t,mst_result_t,mst_runSpGEMM (Sparse Matrix-Matrix Multiplication): Not yet implemented
SpMV (Sparse Matrix-Vector Multiplication): Not yet implemented
These algorithms currently use the low-level API with result structures. PyTorch tensor interfaces are coming soon.
See the C++ API documentation for detailed parameter descriptions.
I/O Utilities#
Options and Configuration#
- class gunrock.options_t#
Algorithm optimization options.
- advance_load_balance: int#
Load balancing strategy for advance operator.
- enable_uniquify: bool#
Enable frontier uniquification (deduplication).
- best_effort_uniquify: bool#
Use best-effort uniquification (faster but less accurate).
- uniquify_percent: float#
Percentage threshold for uniquification.
PyTorch Integration#
PyGunrock provides seamless integration with PyTorch:
Zero-Copy Memory Access
Tensors are allocated directly on GPU and passed to Gunrock without host-device transfers:
distances = torch.full((n,), float('inf'), dtype=torch.float32, device='cuda:0')
elapsed = gunrock.sssp(G, 0, distances, predecessors, context)
# distances now contains results on GPU
Direct PyTorch Operations
Results can be used immediately in PyTorch operations:
# Filter and analyze results
reachable = torch.isfinite(distances)
normalized = distances / distances[reachable].max()
histogram = torch.histc(distances[reachable], bins=10)
close_vertices = (distances <= threshold).nonzero()
# Statistics
print(f"Reachable: {reachable.sum().item()}")
print(f"Mean distance: {distances[reachable].mean().item():.2f}")
Important: Import Order
Always import PyTorch before gunrock for proper GPU initialization:
import torch # Import PyTorch FIRST
import gunrock # Then import gunrock
Examples#
See the python/examples/ directory for complete examples:
sssp.py: High-level SSSP usage with PyTorch tensorspysssp.py: Framework demonstration with operator-based execution
Performance Tips#
Reuse contexts: Create one
multi_context_tand reuse it across multiple algorithm runs.Pre-allocate tensors: Allocate output tensors once and reuse them for multiple runs:
distances = torch.empty(n, dtype=torch.float32, device='cuda:0') for source in sources: distances.fill_(float('inf')) elapsed = gunrock.sssp(G, source, distances, predecessors, context)
Keep data on device: Avoid unnecessary CPU transfers. Only call
.cpu()when needed.Use contiguous tensors: Ensure tensors are contiguous with
.contiguous()if needed.Synchronize explicitly: Call
context.synchronize()after algorithm runs to ensure GPU operations complete before accessing results.Batch operations: When running multiple algorithms on the same graph, build the graph once and reuse it.
Troubleshooting#
“No HIP GPUs are available” Error
Solution: Import PyTorch before gunrock:
import torch # First!
import gunrock # Second
This error occurs when gunrock is imported before PyTorch, preventing proper HIP initialization.
Build Error
Make sure nanobind is installed and specify your GPU architecture:
pip install nanobind
CMAKE_ARGS="-DCMAKE_HIP_ARCHITECTURES=gfx942" pip install .
Replace gfx942 with your GPU architecture (gfx90a for MI200, gfx908 for MI100, etc.).
Import Error
Ensure ROCm/HIP is installed and in your system path:
export ROCM_PATH=/opt/rocm
export PATH=$ROCM_PATH/bin:$PATH
export LD_LIBRARY_PATH=$ROCM_PATH/lib:$LD_LIBRARY_PATH
PyTorch Not Available
Check PyTorch installation with ROCm support:
python -c "import torch; print(torch.cuda.is_available())"
pip install torch --index-url https://download.pytorch.org/whl/rocm7.1
Runtime Error
Check GPU availability:
rocm-smi
python -c "import torch; import gunrock; ctx = gunrock.multi_context_t(0); print('GPU OK')"