PyGunrock API Reference#

High-performance GPU graph analytics using pytorch tensors.

Installation#

Install from git repository:

CMAKE_ARGS="-DCMAKE_HIP_ARCHITECTURES=gfx942" pip install git+https://github.com/gunrock/gunrock.git#subdirectory=python

Or install from source:

cd python
CMAKE_ARGS="-DCMAKE_HIP_ARCHITECTURES=gfx942" pip install .

Requirements:

  • Python >= 3.9

  • ROCm/HIP (system installation)

  • CMake >= 3.25

  • nanobind >= 2.0.0

  • PyTorch >= 2.0.0 with ROCm support

Quick Start#

import torch
import gunrock

# Load graph from Matrix Market file
mm = gunrock.matrix_market_t()
properties, coo = mm.load("graph.mtx")

# Convert to CSR and build device graph
csr = gunrock.csr_t()
csr.from_coo(coo)
G = gunrock.build_graph(properties, csr)

# Create GPU context
context = gunrock.multi_context_t(0)

# Allocate output tensors on GPU
n = coo.number_of_rows
distances = torch.full((n,), float('inf'), dtype=torch.float32, device='cuda:0')
predecessors = torch.full((n,), -1, dtype=torch.int32, device='cuda:0')

# Run SSSP
elapsed_ms = gunrock.sssp(G, 0, distances, predecessors, context)
context.synchronize()

print(f"SSSP completed in {elapsed_ms:.2f} ms")
print(f"Distances: {distances.cpu()}")

Core Components#

Context Management#

class gunrock.multi_context_t(device_id=0)#

GPU context for managing device operations.

Parameters:

device_id (int) – GPU device ID (default: 0)

synchronize()#

Synchronize all GPU operations.

Graph Structures#

class gunrock.graph_properties_t#

Graph properties descriptor.

directed: bool#

Whether the graph is directed.

weighted: bool#

Whether the graph has edge weights.

symmetric: bool#

Whether the graph is symmetric.

class gunrock.graph_t#

Graph object on GPU device.

get_number_of_vertices()#

Get the number of vertices in the graph.

Returns:

Number of vertices

Return type:

int

get_number_of_edges()#

Get the number of edges in the graph.

Returns:

Number of edges

Return type:

int

Graph Formats#

CSR (Compressed Sparse Row)#

class gunrock.csr_t#
class gunrock.csr_t(rows, cols, nnz)

Compressed Sparse Row format.

Parameters:
  • rows – Number of rows

  • cols – Number of columns

  • nnz – Number of non-zeros

number_of_rows: int#

Number of rows in the matrix.

number_of_columns: int#

Number of columns in the matrix.

number_of_nonzeros: int#

Number of non-zero elements.

from_coo(coo)#

Convert from COO format to CSR.

Parameters:

coo (gunrock.coo_t) – COO format matrix

read_binary(filename)#

Read CSR from binary file.

Parameters:

filename (str) – Path to binary file

COO (Coordinate Format)#

class gunrock.coo_t#
class gunrock.coo_t(rows, cols, nnz)

Coordinate format.

Parameters:
  • rows – Number of rows

  • cols – Number of columns

  • nnz – Number of non-zeros

number_of_rows: int#
number_of_columns: int#
number_of_nonzeros: int#

CSC (Compressed Sparse Column)#

class gunrock.csc_t#
class gunrock.csc_t(rows, cols, nnz)

Compressed Sparse Column format.

Parameters:
  • rows – Number of rows

  • cols – Number of columns

  • nnz – Number of non-zeros

number_of_rows: int#
number_of_columns: int#
number_of_nonzeros: int#
gunrock.build_graph(properties, csr)#

Build a graph on GPU device from CSR format.

Parameters:
Returns:

Graph object on device

Return type:

gunrock.graph_t

Algorithms#

SSSP (Single-Source Shortest Path)#

gunrock.sssp(graph, source, distances, predecessors, context=None, options=None)#

Run Single-Source Shortest Path algorithm with PyTorch tensors.

Parameters:
  • graph (gunrock.graph_t) – Input graph

  • source (int) – Source vertex ID

  • distances (torch.Tensor) – Output distance tensor (float32, on GPU)

  • predecessors (torch.Tensor) – Output predecessor tensor (int32, on GPU)

  • context (gunrock.multi_context_t) – GPU context (optional, default: device 0)

  • options (gunrock.options_t) – Algorithm options (optional)

Returns:

Elapsed time in milliseconds

Return type:

float

Example:

import torch
import gunrock

# ... load graph and build G ...

context = gunrock.multi_context_t(0)
n = G.get_number_of_vertices()

distances = torch.full((n,), float('inf'), dtype=torch.float32, device='cuda:0')
predecessors = torch.full((n,), -1, dtype=torch.int32, device='cuda:0')

elapsed = gunrock.sssp(G, 0, distances, predecessors, context)
context.synchronize()

# Use results in PyTorch operations
reachable = torch.isfinite(distances)
print(f"Reachable vertices: {reachable.sum().item()}")
class gunrock.sssp_param_t(single_source, options=None)#

Low-level SSSP algorithm parameters (for advanced use).

Parameters:
  • single_source (int) – Source vertex ID

  • options (gunrock.options_t) – Algorithm options (optional)

BC (Betweenness Centrality)#

class gunrock.bc_param_t(single_source, options=None)#

BC algorithm parameters.

Parameters:
  • single_source (int) – Source vertex ID

  • options (gunrock.options_t) – Algorithm options (optional)

class gunrock.bc_result_t(bc_values)#

BC algorithm results.

Parameters:

bc_values – Output betweenness centrality values (float32 pointer)

gunrock.bc_run(graph, param, result, context=None)#

Run Betweenness Centrality algorithm.

Parameters:
Returns:

Elapsed time in milliseconds

Return type:

float

PR (PageRank)#

class gunrock.pr_param_t(alpha=0.85, tol=1e-06, options=None)#

PageRank algorithm parameters.

Parameters:
  • alpha (float) – Damping factor (default: 0.85)

  • tol (float) – Convergence tolerance (default: 1e-6)

  • options (gunrock.options_t) – Algorithm options (optional)

class gunrock.pr_result_t(p)#

PageRank algorithm results.

Parameters:

p – Output PageRank values (float32 pointer)

gunrock.pr_run(graph, param, result, context=None)#

Run PageRank algorithm.

Parameters:
Returns:

Elapsed time in milliseconds

Return type:

float

PPR (Personalized PageRank)#

class gunrock.ppr_param_t(seed, alpha=0.85, epsilon=1e-06, options=None)#

Personalized PageRank algorithm parameters.

Parameters:
  • seed (int) – Source vertex ID (seed vertex for personalized PageRank)

  • alpha (float) – Damping factor (default: 0.85)

  • epsilon (float) – Convergence tolerance (default: 1e-6)

  • options (gunrock.options_t) – Algorithm options (optional)

class gunrock.ppr_result_t(p)#

PPR algorithm results.

Parameters:

p – Output PPR values (float32 pointer)

gunrock.ppr_run(graph, param, result, context=None)#

Run Personalized PageRank algorithm.

Parameters:
Returns:

Elapsed time in milliseconds

Return type:

float

TC (Triangle Counting)#

class gunrock.tc_param_t(reduce_all_triangles=False, options=None)#

Triangle Counting algorithm parameters.

Parameters:
  • reduce_all_triangles (bool) – Whether to reduce all triangles to a single count (default: False)

  • options (gunrock.options_t) – Algorithm options (optional)

class gunrock.tc_result_t(vertex_triangles_count, total_triangles_count)#

TC algorithm results.

Parameters:
  • vertex_triangles_count – Output per-vertex triangle counts (int32 pointer)

  • total_triangles_count – Output total triangle count (uint64 pointer)

gunrock.tc_run(graph, param, result, context=None)#

Run Triangle Counting algorithm.

Parameters:
Returns:

Elapsed time in milliseconds

Return type:

float

Color (Graph Coloring)#

class gunrock.color_param_t(options=None)#

Graph Coloring algorithm parameters.

Parameters:

options (gunrock.options_t) – Algorithm options (optional)

class gunrock.color_result_t(colors)#

Graph Coloring algorithm results.

Parameters:

colors – Output color assignments (int32 pointer)

gunrock.color_run(graph, param, result, context=None)#

Run Graph Coloring algorithm.

Parameters:
Returns:

Elapsed time in milliseconds

Return type:

float

Additional Algorithms#

PyGunrock also includes low-level bindings for:

  • Geo (Graph Embedding): geo_param_t, geo_result_t, geo_run

  • HITS (Hyperlink-Induced Topic Search): hits_param_t (result and run not yet exposed)

  • K-Core: kcore_param_t, kcore_result_t, kcore_run

  • MST (Minimum Spanning Tree): mst_param_t, mst_result_t, mst_run

  • SpGEMM (Sparse Matrix-Matrix Multiplication): Not yet implemented

  • SpMV (Sparse Matrix-Vector Multiplication): Not yet implemented

These algorithms currently use the low-level API with result structures. PyTorch tensor interfaces are coming soon.

See the C++ API documentation for detailed parameter descriptions.

I/O Utilities#

class gunrock.matrix_market_t#

Matrix Market file reader.

load(filename)#

Load graph from Matrix Market file.

Parameters:

filename (str) – Path to .mtx file

Returns:

Tuple of (properties, coo_matrix)

Return type:

tuple

Options and Configuration#

class gunrock.options_t#

Algorithm optimization options.

advance_load_balance: int#

Load balancing strategy for advance operator.

enable_uniquify: bool#

Enable frontier uniquification (deduplication).

best_effort_uniquify: bool#

Use best-effort uniquification (faster but less accurate).

uniquify_percent: float#

Percentage threshold for uniquification.

class gunrock.memory_space_t#

Memory space enumeration.

host#

Host (CPU) memory.

device#

Device (GPU) memory.

class gunrock.view_t#

Graph view enumeration.

csr#

CSR view.

csc#

CSC view.

coo#

COO view.

invalid#

Invalid view.

PyTorch Integration#

PyGunrock provides seamless integration with PyTorch:

Zero-Copy Memory Access

Tensors are allocated directly on GPU and passed to Gunrock without host-device transfers:

distances = torch.full((n,), float('inf'), dtype=torch.float32, device='cuda:0')
elapsed = gunrock.sssp(G, 0, distances, predecessors, context)
# distances now contains results on GPU

Direct PyTorch Operations

Results can be used immediately in PyTorch operations:

# Filter and analyze results
reachable = torch.isfinite(distances)
normalized = distances / distances[reachable].max()
histogram = torch.histc(distances[reachable], bins=10)
close_vertices = (distances <= threshold).nonzero()

# Statistics
print(f"Reachable: {reachable.sum().item()}")
print(f"Mean distance: {distances[reachable].mean().item():.2f}")

Important: Import Order

Always import PyTorch before gunrock for proper GPU initialization:

import torch  # Import PyTorch FIRST
import gunrock  # Then import gunrock

Examples#

See the python/examples/ directory for complete examples:

  • sssp.py: High-level SSSP usage with PyTorch tensors

  • pysssp.py: Framework demonstration with operator-based execution

Performance Tips#

  1. Reuse contexts: Create one multi_context_t and reuse it across multiple algorithm runs.

  2. Pre-allocate tensors: Allocate output tensors once and reuse them for multiple runs:

    distances = torch.empty(n, dtype=torch.float32, device='cuda:0')
    for source in sources:
        distances.fill_(float('inf'))
        elapsed = gunrock.sssp(G, source, distances, predecessors, context)
    
  3. Keep data on device: Avoid unnecessary CPU transfers. Only call .cpu() when needed.

  4. Use contiguous tensors: Ensure tensors are contiguous with .contiguous() if needed.

  5. Synchronize explicitly: Call context.synchronize() after algorithm runs to ensure GPU operations complete before accessing results.

  6. Batch operations: When running multiple algorithms on the same graph, build the graph once and reuse it.

Troubleshooting#

“No HIP GPUs are available” Error

Solution: Import PyTorch before gunrock:

import torch  # First!
import gunrock  # Second

This error occurs when gunrock is imported before PyTorch, preventing proper HIP initialization.

Build Error

Make sure nanobind is installed and specify your GPU architecture:

pip install nanobind
CMAKE_ARGS="-DCMAKE_HIP_ARCHITECTURES=gfx942" pip install .

Replace gfx942 with your GPU architecture (gfx90a for MI200, gfx908 for MI100, etc.).

Import Error

Ensure ROCm/HIP is installed and in your system path:

export ROCM_PATH=/opt/rocm
export PATH=$ROCM_PATH/bin:$PATH
export LD_LIBRARY_PATH=$ROCM_PATH/lib:$LD_LIBRARY_PATH

PyTorch Not Available

Check PyTorch installation with ROCm support:

python -c "import torch; print(torch.cuda.is_available())"
pip install torch --index-url https://download.pytorch.org/whl/rocm7.1

Runtime Error

Check GPU availability:

rocm-smi
python -c "import torch; import gunrock; ctx = gunrock.multi_context_t(0); print('GPU OK')"