Gunrock API Reference#

This page provides quick access to commonly-used algorithm and operator functions. For complete API documentation including all overloads, see the auto-generated API reference.

Algorithms#

BC (Betweenness Centrality)#

See bc.hxx for BC algorithm documentation with multiple overloads.

Color (Graph Coloring)#

See color.hxx for graph coloring algorithm documentation with multiple overloads.

Geo (Graph Embedding)#

Warning

doxygenfunction: Unable to resolve function “gunrock::geo::run” with arguments None in doxygen xml output for project “gunrock” from directory: ./_doxygen/xml. Potential matches:

- template<typename graph_t> float run(graph_t &G, coordinates_t *coordinates, const unsigned int total_iterations, const unsigned int spatial_iterations = 1000, std::shared_ptr<gcuda::multi_context_t> context = std::shared_ptr<gcuda::multi_context_t>(new gcuda::multi_context_t(0)))
- template<typename graph_t> float run(graph_t &G, param_t &param, result_t &result, std::shared_ptr<gcuda::multi_context_t> context = std::shared_ptr<gcuda::multi_context_t>(new gcuda::multi_context_t(0)))

HITS#

template<typename graph_t, typename result_t>
float gunrock::hits::run(graph_t &G, unsigned int max_iterations, result_t &result, std::shared_ptr<gcuda::multi_context_t> context = std::shared_ptr<gcuda::multi_context_t>(new gcuda::multi_context_t(0)))#

Run HITS (Hyperlink-Induced Topic Search) algorithm on a given graph to compute authority and hub scores for each vertex.

Template Parameters:
  • graph_t – Graph type.

  • result_t – Result type containing authority and hub score arrays.

Parameters:
  • G – Graph object.

  • max_iterations – Maximum number of iterations to run the algorithm.

  • result – Algorithm results containing authority and hub scores.

  • context – Device context.

Returns:

float Time taken to run the algorithm.

K-Core#

Warning

doxygenfunction: Unable to resolve function “gunrock::kcore::run” with arguments None in doxygen xml output for project “gunrock” from directory: ./_doxygen/xml. Potential matches:

- template<typename graph_t> float run(graph_t &G, int *k_cores, std::shared_ptr<gcuda::multi_context_t> context = std::shared_ptr<gcuda::multi_context_t>(new gcuda::multi_context_t(0)))
- template<typename graph_t> float run(graph_t &G, param_t &param, result_t<int> &result, std::shared_ptr<gcuda::multi_context_t> context = std::shared_ptr<gcuda::multi_context_t>(new gcuda::multi_context_t(0)))

MST (Minimum Spanning Tree)#

Warning

doxygenfunction: Unable to resolve function “gunrock::mst::run” with arguments None in doxygen xml output for project “gunrock” from directory: ./_doxygen/xml. Potential matches:

- template<typename graph_t> float run(graph_t &G, param_t<typename graph_t::vertex_type> &param, result_t<typename graph_t::vertex_type, typename graph_t::weight_type> &result, std::shared_ptr<gcuda::multi_context_t> context = std::shared_ptr<gcuda::multi_context_t>(new gcuda::multi_context_t(0)))
- template<typename graph_t> float run(graph_t &G, typename graph_t::weight_type *mst_weight, std::shared_ptr<gcuda::multi_context_t> context = std::shared_ptr<gcuda::multi_context_t>(new gcuda::multi_context_t(0)))

PPR (Personalized PageRank)#

See ppr.hxx for PPR algorithm documentation with multiple overloads.

PR (PageRank)#

Warning

doxygenfunction: Unable to resolve function “gunrock::pr::run” with arguments None in doxygen xml output for project “gunrock” from directory: ./_doxygen/xml. Potential matches:

- template<typename graph_t> float run(graph_t &G, param_t<typename graph_t::weight_type> &param, result_t<typename graph_t::weight_type> &result, std::shared_ptr<gcuda::multi_context_t> context = std::shared_ptr<gcuda::multi_context_t>(new gcuda::multi_context_t(0)))
- template<typename graph_t> float run(graph_t &G, typename graph_t::weight_type alpha, typename graph_t::weight_type tol, typename graph_t::weight_type *p, std::shared_ptr<gcuda::multi_context_t> context = std::shared_ptr<gcuda::multi_context_t>(new gcuda::multi_context_t(0)))

SpGEMM#

template<typename a_graph_t, typename b_graph_t, typename csr_t>
float gunrock::spgemm::run(a_graph_t &A, b_graph_t &B, csr_t &C, std::shared_ptr<gcuda::multi_context_t> context = std::shared_ptr<gcuda::multi_context_t>(new gcuda::multi_context_t(0)))#

Run Sparse Matrix-Matrix Multiplication (SpGEMM) algorithm to compute the product C = A * B where A and B are sparse matrices represented as graphs.

Template Parameters:
  • a_graph_t – Type of the first input graph (matrix A).

  • b_graph_t – Type of the second input graph (matrix B).

  • csr_t – Type of the output CSR matrix (result C).

Parameters:
  • A – First input graph representing sparse matrix A.

  • B – Second input graph representing sparse matrix B.

  • C – Output CSR matrix to store the result C = A * B.

  • context – Device context.

Returns:

float Time taken to run the algorithm.

SpMV#

Warning

doxygenfunction: Unable to resolve function “gunrock::spmv::run” with arguments None in doxygen xml output for project “gunrock” from directory: ./_doxygen/xml. Potential matches:

- template<typename graph_t> float run(graph_t &G, param_t<typename graph_t::weight_type> &param, result_t<typename graph_t::weight_type> &result, std::shared_ptr<gcuda::multi_context_t> context = std::shared_ptr<gcuda::multi_context_t>(new gcuda::multi_context_t(0)))
- template<typename graph_t> float run(graph_t &G, typename graph_t::weight_type *x, typename graph_t::weight_type *y, std::shared_ptr<gcuda::multi_context_t> context = std::shared_ptr<gcuda::multi_context_t>(new gcuda::multi_context_t(0)))

SSSP (Single-Source Shortest Path)#

Warning

doxygenfunction: Unable to resolve function “gunrock::sssp::run” with arguments None in doxygen xml output for project “gunrock” from directory: ./_doxygen/xml. Potential matches:

- template<typename graph_t> float run(graph_t &G, param_t<typename graph_t::vertex_type> &param, result_t<typename graph_t::vertex_type, typename graph_t::weight_type> &result, std::shared_ptr<gcuda::multi_context_t> context = std::shared_ptr<gcuda::multi_context_t>(new gcuda::multi_context_t(0)))
- template<typename graph_t> float run(graph_t &G, typename graph_t::vertex_type &single_source, typename graph_t::weight_type *distances, typename graph_t::vertex_type *predecessors, std::shared_ptr<gcuda::multi_context_t> context = std::shared_ptr<gcuda::multi_context_t>(new gcuda::multi_context_t(0)))

TC (Triangle Counting)#

Warning

doxygenfunction: Unable to resolve function “gunrock::tc::run” with arguments None in doxygen xml output for project “gunrock” from directory: ./_doxygen/xml. Potential matches:

- template<typename graph_t> float run(graph_t &G, bool reduce_all_triangles, typename graph_t::vertex_type *vertex_triangles_count, std::size_t *total_triangles_count, std::shared_ptr<gcuda::multi_context_t> context = std::shared_ptr<gcuda::multi_context_t>(new gcuda::multi_context_t(0)))
- template<typename graph_t> float run(graph_t &G, param_t<typename graph_t::vertex_type> &param, result_t<typename graph_t::vertex_type> &result, std::shared_ptr<gcuda::multi_context_t> context = std::shared_ptr<gcuda::multi_context_t>(new gcuda::multi_context_t(0)))

Operators#

template<load_balance_t lb, advance_direction_t direction, advance_io_type_t input_type, advance_io_type_t output_type, typename graph_t, typename operator_t, typename frontier_t, typename work_tiles_t>
void gunrock::operators::advance::execute(graph_t &G, operator_t op, frontier_t *input, frontier_t *output, work_tiles_t &segments, gcuda::multi_context_t &context)#

An advance operator generates a new frontier from an input frontier by visiting the neighbors of the input frontier.

// C++ lambda operator to be used within advance.
auto sample = [=] __host__ __device__(
                        vertex_t const& source,    // ... source
                        vertex_t const& neighbor,  // neighbor
                        edge_t const& edge,        // edge
                        weight_t const& weight     // weight (tuple).
                        ) -> bool {
  return true; // keeps the neighbor in the output frontier.
};

// Execute advance operator on the provided lambda.
operators::advance::execute<operators::load_balance_t::block_mapped,
                            operators::advance_direction_t::forward,
                            operators::advance_io_type_t::vertex,
                            operators::advance_io_type_t::vertex>(
  G, sample, input_frontier, output_frontier, storage, context);
Overview

A frontier can consist of either vertices or edges, and an advance step can input and output either kind of frontier. The output frontier only contains the neighbors and not the source vertex/edges of the input frontier. Advance is an irregularly-parallel operation for two reasons:

  1. different vertices in a graph have different numbers of neighbors and

  2. vertices share neighbors. Thus a vertex in an input frontier map to multiple output items. An efficient advance is the most significant challenge of a GPU implementation.

Example

The following code is a simple snippet on how to use advance within the enactor loop.

Note

Advance does not remove self-loops, i.e., a vertex can be a neighbor of itself.

Template Parameters:
  • lbgunrock::operators::load_balance_t enum, determines which load-balancing algorithm to use when running advance.

  • directiongunrock::operators::advance_direction_t enum. Determines the direction when advancing the input frontier (foward, backward, both).

  • graph_tgunrock::graph_t struct.

  • operator_t – is the type of the lambda function being passed in.

  • frontier_tgunrock::frontier_t.

  • work_tiles_t – type of the storaged space for scanned items.

Parameters:
  • G – input graph used to determine the neighboring vertex/edges of the input frontier.

  • op – lambda operator to apply to each source, neighbor, edge (and weight) tuple.

  • input – input frontier being passed in.

  • output – resultant output frontier containing the neighbors.

  • segments – storaged space for scanned items (segment offsets).

  • context – a cuda::multi_context_t that contains GPU contexts for the available CUDA devices. Used to launch the advance kernels.

template<load_balance_t lb = load_balance_t::merge_path, advance_direction_t direction = advance_direction_t::forward, advance_io_type_t input_type = advance_io_type_t::vertices, advance_io_type_t output_type = advance_io_type_t::vertices, typename graph_t, typename enactor_type, typename operator_type>
void gunrock::operators::advance::execute(graph_t &G, enactor_type *E, operator_type op, gcuda::multi_context_t &context, bool swap_buffers = true)#

An advance operator generates a new frontier from an input frontier by visiting the neighbors of the input frontier.

// C++ lambda operator to be used within advance.
auto sample = [=] __host__ __device__(
                        vertex_t const& source,    // ... source
                        vertex_t const& neighbor,  // neighbor
                        edge_t const& edge,        // edge
                        weight_t const& weight     // weight (tuple).
                        ) -> bool {
  return true; // keeps the neighbor in the output frontier.
};

// Execute advance operator on the provided lambda.
operators::advance::execute<operators::load_balance_t::block_mapped,
                            operators::advance_direction_t::forward,
                            operators::advance_io_type_t::vertex,
                            operators::advance_io_type_t::vertex>(
  G, E, sample, context);
Overview

A frontier can consist of either vertices or edges, and an advance step can input and output either kind of frontier. The output frontier only contains the neighbors and not the source vertex/edges of the input frontier. Advance is an irregularly-parallel operation for two reasons:

  1. different vertices in a graph have different numbers of neighbors and

  2. vertices share neighbors. Thus a vertex in an input frontier map to multiple output items. An efficient advance is the most significant challenge of a GPU implementation.

Example

The following code is a simple snippet on how to use advance within the enactor loop.

Note

Advance does not remove self-loops, i.e., a vertex can be a neighbor of itself.

Template Parameters:
  • lbgunrock::operators::load_balance_t enum, determines which load-balancing algorithm to use when running advance.

  • directiongunrock::operators::advance_direction_t enum. Determines the direction when advancing the input frontier (foward, backward, both).

  • graph_tgunrock::graph_t struct.

  • enactor_type – is the type of the enactor being passed in.

  • operator_type – is the type of the lambda function being passed in.

Parameters:
  • G – input graph used to determine the neighboring vertex/edges of the input frontier.

  • E – gunrock enactor, holds the enactor pointers to input, output frontier buffers and storage space for work segments.

  • op – lambda operator to apply to each source, neighbor, edge (and weight) tuple.

  • context – a cuda::multi_context_t that contains GPU contexts for the available CUDA devices. Used to launch the advance kernels.

  • swap_buffers – (default = true), swap input and output buffers of the enactor, such that the input buffer gets reused as the output buffer in the next iteration. Use false to disable the swap behavior.

template<filter_algorithm_t alg_type, typename graph_t, typename operator_t, typename frontier_t>
void gunrock::operators::filter::execute(graph_t &G, operator_t op, frontier_t *input, frontier_t *output, gcuda::multi_context_t &context)#

A filter operator generates a new frontier from an input frontier by removing elements in the frontier that do not satisfy a predicate.

auto sample = [=] __host__ __device__(
  vertex_t const& vertex) -> bool {
  return (vertex % 2 == 0) ? true : false; // keep even vertices
};

// Execute filter operator on the provided lambda
operators::filter::execute<operators::filter_algorithm_t::bypass>(
  G, sample, input_frontier, output_frontier, context);
Overview

A frontier consists of vertices or edges, a filter applied to an incoming frontier will generate a new frontier by removing the elements that do not satisfy a user-defined predicate. The predicate function takes a vertex or edge and returns a boolean value. If the boolean value is true, the element is kept in the new frontier. If the boolean value is false, the element is removed from the new frontier.

Example

The following code is a simple snippet on how to use filter operator with input and output frontiers.

Template Parameters:
  • alg_type – The filter algorithm to use.

  • graph_t – The graph type.

  • operator_t – The operator type.

  • frontier_t – The frontier type.

Parameters:
  • G – Input graph used.

  • op – Predicate function, can be defined using a C++ lambda function.

  • input – Input frontier.

  • output – Output frontier (some algorithms may not use this, and allow for in-place filter operation).

  • context – a gcuda::multi_context_t that contains GPU contexts for the available CUDA devices. Used to launch the filter kernels.

template<filter_algorithm_t alg_type, typename graph_t, typename enactor_type, typename operator_t>
void gunrock::operators::filter::execute(graph_t &G, enactor_type *E, operator_t op, gcuda::multi_context_t &context, bool swap_buffers = true)#

A filter operator generates a new frontier from an input frontier by removing elements in the frontier that do not satisfy a predicate.

auto sample = [=] __host__ __device__(
  vertex_t const& vertex) -> bool {
  return (vertex % 2 == 0) ? true : false; // keep even vertices
};

// Execute filter operator on the provided lambda
operators::filter::execute<operators::filter_algorithm_t::bypass>(
  G, E, sample, context);
Overview

A frontier consists of vertices or edges, a filter applied to an incoming frontier will generate a new frontier by removing the elements that do not satisfy a user-defined predicate. The predicate function takes a vertex or edge and returns a boolean value. If the boolean value is true, the element is kept in the new frontier. If the boolean value is false, the element is removed from the new frontier.

Example

The following code is a simple snippet on how to use filter within the enactor loop instead of explicit input and output frontiers. The enactor interface automatically swap the input and output after each iteration.

Template Parameters:
  • alg_type – The filter algorithm to use.

  • graph_t – The graph type.

  • enactor_type – The enactor type.

  • operator_t – The operator type.

  • frontier_t – The frontier type.

Parameters:
  • G – Input graph used.

  • op – Predicate function, can be defined using a C++ lambda function.

  • E – Enactor struct containing input and output frontiers.

  • context – a gcuda::multi_context_t that contains GPU contexts for the available CUDA devices. Used to launch the filter kernels.

template<typename function_t, typename ...args_t>
void gunrock::operators::batch::execute(function_t f, std::size_t number_of_jobs, float *total_elapsed, args_t&... args)#

Batch operator takes a function and executes it in parallel till the desired number of jobs. The function is executed from the CPU using C++ std::threads.

// Lambda function to be executed in separate std::threads.
auto f = [&](std::size_t job_idx) -> float {
  // Function to run in batches, this can be an entire application
  // running on the GPU with different inputs (such as job_idx as a vertex).
  return run_function(G, (vertex_t)job_idx); // ... run another function.
};

// Execute the batch operator on the provided lambda function.
operators::batch::execute(f, 10, total_elapsed.data());
Overview

The batch operator takes a function and executes it in parallel for number of jobs count. This is a very rudimentary implementation of the batch operator, we can expand this further by calculating the available GPU resources, and also possibly using this to parallelize the batches onto multiple GPUs.

Example

The following code snippet is a simple snippet on how to use the batch operator.

Template Parameters:
  • function_t – The function type.

  • args_t – type of the arguments to the function.

Parameters:
  • f – The function to execute.

  • number_of_jobs – Number of jobs to execute.

  • total_elapsed – pointer to an array of size 1, that stores the time taken to execute all the batches.

  • args – variadic arguments to be passed to the function (wip).

template<parallel_for_each_t type, typename func_t, typename frontier_t>
std::enable_if_t<type == parallel_for_each_t::element> gunrock::operators::parallel_for::execute(frontier_t &f, func_t op, gcuda::multi_context_t &context)#

For each element in the frontier, apply a user-defined function.

See also

gcuda::multi_context_t).

Template Parameters:
Parameters:
  • f – Frontiers to apply user-defined function to.

  • op – User-defined function.

  • context – Device context (

template<parallel_for_each_t type, typename func_t, typename graph_t>
std::enable_if_t<type != parallel_for_each_t::element> gunrock::operators::parallel_for::execute(graph_t &G, func_t op, gcuda::multi_context_t &context)#

For each vertex, edge or edge weight in the graph, apply a user-defined function.

See also

gcuda::multi_context_t).

Template Parameters:
Parameters:
  • G – Graph to apply user-defined function to.

  • op – User-defined function.

  • context – Device context (

template<advance_io_type_t input_t = advance_io_type_t::graph, typename graph_t, typename enactor_t, typename output_t, typename operator_t, typename arithmetic_t>
void gunrock::operators::neighborreduce::execute(graph_t &G, enactor_t *E, output_t *output, operator_t op, arithmetic_t arithmetic_op, output_t init_value, gcuda::multi_context_t &context)#

Neighbor reduce operator that performs reduction on the segments of neighbors (or data associated with the neighbors), where each segment is defined by the source vertex.

See also

gcuda::multi_context_t).

Overview

Neighbor reduce operator, built on top of segmented reduction. This is a very limited approach to neighbor reduce, and only gives you the edge per advance. It’s only implemented on the entire graph (frontiers not yet supported).

Template Parameters:
  • input_t – Advance input type (advance_io_type_t::graph supported).

  • graph_t – Graph type.

  • enactor_t – Enactor type.

  • output_t – Output type.

  • operator_t – User-defined lambda function type.

  • arithmetic_t – Binary function type (arithmetic operator such as sum, max, min, etc.).

Parameters:
  • G – Graph to perform advance-reduce on.

  • E – Enactor structure (not used as of right now).

  • output – Output buffer.

  • op – User-defined lambda function.

  • arithmetic_op – Arithmetic operator (binary).

  • init_value – Initial value for the reduction.

  • context – CUDA context (

template<uniquify_algorithm_t type, typename frontier_t>
void gunrock::operators::uniquify::execute(frontier_t *input, frontier_t *output, gcuda::multi_context_t &context, bool best_effort_uniquification = false, const float uniquification_percent = 100)#

Remove duplicate elements from a frontier using a specified uniquify algorithm.

See also

gcuda::multi_context_t).

Template Parameters:
Parameters:
  • input – Input frontier to uniquify.

  • output – Output frontier to store unique elements.

  • context – Device context (

  • best_effort_uniquification – If true, skip sorting for best-effort uniquification (default: false).

  • uniquification_percent – Percentage of elements to uniquify (0-100, default: 100).

template<uniquify_algorithm_t type = uniquify_algorithm_t::unique, typename enactor_type>
void gunrock::operators::uniquify::execute(enactor_type *E, gcuda::multi_context_t &context, bool best_effort_uniquification = false, const float uniquification_percent = 100, bool swap_buffers = true)#

Remove duplicate elements from the enactor’s frontier using a specified uniquify algorithm.

See also

gcuda::multi_context_t).

Template Parameters:
Parameters:
  • E – Enactor containing input and output frontiers.

  • context – Device context (

  • best_effort_uniquification – If true, skip sorting for best-effort uniquification (default: false).

  • uniquification_percent – Percentage of elements to uniquify (0-100, default: 100).

  • swap_buffers – If true, swap input and output frontier buffers after execution (default: true).

Frontiers#

template<typename vertex_t, typename edge_t, frontier_kind_t _kind = frontier_kind_t::vertex_frontier, frontier_view_t _view = frontier_view_t::vector>
class frontier_t : public gunrock::frontier::vector_frontier_t<vertex_t, edge_t, frontier_kind_t::vertex_frontier>#
template<typename vertex_t, typename edge_t, frontier_kind_t _kind>
class vector_frontier_t#

Subclassed by gunrock::frontier::frontier_t< vertex_t, edge_t, _kind, _view >

enum gunrock::frontier::frontier_view_t#

Underlying frontier data structure.

Values:

enumerator vector#

vector-based frontier

enumerator bitmap#

bitmap-based frontier

enumerator boolmap#

boolmap-based frontier

enum gunrock::frontier::frontier_kind_t#

Type of frontier (vertex or edge)

Todo:

Use a better name than frontier_kind_t.

Values:

enumerator vertex_frontier#

vertex frontier storage only

enumerator edge_frontier#

edge frontier storage only

enumerator vertex_edge_frontier#

(wip)