Graph Operators#

Advance Operator#

template<load_balance_t lb, advance_direction_t direction, advance_io_type_t input_type, advance_io_type_t output_type, typename graph_t, typename operator_t, typename frontier_t, typename work_tiles_t>
void gunrock::operators::advance::execute(graph_t &G, operator_t op, frontier_t *input, frontier_t *output, work_tiles_t &segments, gcuda::multi_context_t &context)#

An advance operator generates a new frontier from an input frontier by visiting the neighbors of the input frontier.

// C++ lambda operator to be used within advance.
auto sample = [=] __host__ __device__(
                        vertex_t const& source,    // ... source
                        vertex_t const& neighbor,  // neighbor
                        edge_t const& edge,        // edge
                        weight_t const& weight     // weight (tuple).
                        ) -> bool {
  return true; // keeps the neighbor in the output frontier.
};

// Execute advance operator on the provided lambda.
operators::advance::execute<operators::load_balance_t::block_mapped,
                            operators::advance_direction_t::forward,
                            operators::advance_io_type_t::vertex,
                            operators::advance_io_type_t::vertex>(
  G, sample, input_frontier, output_frontier, storage, context);
Overview

A frontier can consist of either vertices or edges, and an advance step can input and output either kind of frontier. The output frontier only contains the neighbors and not the source vertex/edges of the input frontier. Advance is an irregularly-parallel operation for two reasons:

  1. different vertices in a graph have different numbers of neighbors and

  2. vertices share neighbors. Thus a vertex in an input frontier map to multiple output items. An efficient advance is the most significant challenge of a GPU implementation.

Example

The following code is a simple snippet on how to use advance within the enactor loop.

Note

Advance does not remove self-loops, i.e., a vertex can be a neighbor of itself.

Template Parameters:
  • lbgunrock::operators::load_balance_t enum, determines which load-balancing algorithm to use when running advance.

  • directiongunrock::operators::advance_direction_t enum. Determines the direction when advancing the input frontier (foward, backward, both).

  • graph_tgunrock::graph_t struct.

  • operator_t – is the type of the lambda function being passed in.

  • frontier_tgunrock::frontier_t.

  • work_tiles_t – type of the storaged space for scanned items.

Parameters:
  • G – input graph used to determine the neighboring vertex/edges of the input frontier.

  • op – lambda operator to apply to each source, neighbor, edge (and weight) tuple.

  • input – input frontier being passed in.

  • output – resultant output frontier containing the neighbors.

  • segments – storaged space for scanned items (segment offsets).

  • context – a cuda::multi_context_t that contains GPU contexts for the available CUDA devices. Used to launch the advance kernels.

template<load_balance_t lb = load_balance_t::merge_path, advance_direction_t direction = advance_direction_t::forward, advance_io_type_t input_type = advance_io_type_t::vertices, advance_io_type_t output_type = advance_io_type_t::vertices, typename graph_t, typename enactor_type, typename operator_type>
void gunrock::operators::advance::execute(graph_t &G, enactor_type *E, operator_type op, gcuda::multi_context_t &context, bool swap_buffers = true)#

An advance operator generates a new frontier from an input frontier by visiting the neighbors of the input frontier.

// C++ lambda operator to be used within advance.
auto sample = [=] __host__ __device__(
                        vertex_t const& source,    // ... source
                        vertex_t const& neighbor,  // neighbor
                        edge_t const& edge,        // edge
                        weight_t const& weight     // weight (tuple).
                        ) -> bool {
  return true; // keeps the neighbor in the output frontier.
};

// Execute advance operator on the provided lambda.
operators::advance::execute<operators::load_balance_t::block_mapped,
                            operators::advance_direction_t::forward,
                            operators::advance_io_type_t::vertex,
                            operators::advance_io_type_t::vertex>(
  G, E, sample, context);
Overview

A frontier can consist of either vertices or edges, and an advance step can input and output either kind of frontier. The output frontier only contains the neighbors and not the source vertex/edges of the input frontier. Advance is an irregularly-parallel operation for two reasons:

  1. different vertices in a graph have different numbers of neighbors and

  2. vertices share neighbors. Thus a vertex in an input frontier map to multiple output items. An efficient advance is the most significant challenge of a GPU implementation.

Example

The following code is a simple snippet on how to use advance within the enactor loop.

Note

Advance does not remove self-loops, i.e., a vertex can be a neighbor of itself.

Template Parameters:
  • lbgunrock::operators::load_balance_t enum, determines which load-balancing algorithm to use when running advance.

  • directiongunrock::operators::advance_direction_t enum. Determines the direction when advancing the input frontier (foward, backward, both).

  • graph_tgunrock::graph_t struct.

  • enactor_type – is the type of the enactor being passed in.

  • operator_type – is the type of the lambda function being passed in.

Parameters:
  • G – input graph used to determine the neighboring vertex/edges of the input frontier.

  • E – gunrock enactor, holds the enactor pointers to input, output frontier buffers and storage space for work segments.

  • op – lambda operator to apply to each source, neighbor, edge (and weight) tuple.

  • context – a cuda::multi_context_t that contains GPU contexts for the available CUDA devices. Used to launch the advance kernels.

  • swap_buffers – (default = true), swap input and output buffers of the enactor, such that the input buffer gets reused as the output buffer in the next iteration. Use false to disable the swap behavior.

enum gunrock::operators::load_balance_t#

Load balancing techniques options (for now, solely for advance).

clang-format off

Overview

The following enum defines load-balancing techniques supported (or to be supported) within gunrock. Right now these techniques are only valid for advance, but we envision future operators can also benefit from them. This enum can be passed into advance template parameter list to select the respective underlying load-balanced advance kernel to use. The following table attempts to summarize these techniques: https://gist.github.com/neoblizz/fc4a3da5f4fc51f2f2a90753b5c49762

| Technique | Thread-Mapped | Block-Mapped | Warp-Mapped | Bucketing | Non-Zero Split | Merge-Path | Work Stealing | |-|-|-|-|-|-|-|-| | Summary | 1 element per thread | Block-sized elements per block | Warp-sized elements per warp | Buckets per threads, warps, blocks | Equal non-zeros per thread | Equal work-item (input and output) per thread | Steal work from threads when starving | | Number of Scans | 1 | 2 | 2 | Unknown | Unknown | 1 | Unknown | | Type of Scans | Device | Device (not-required), Block | Device, Warp | Unknown | Unknown | Device | Unknown | | Binary-Search | N/A | N/A | N/A | N/A | 1 | 2 | N/A | | Static or Dynamic | Static | Static | Static | Dynamic | Static | Static | Dynamic | | Overall Estimated Overhead | None | Minor | Minor | Medium | High | Very High | High | | Quality of Balancing | Poor (Data dependent) | HW Block-Scheduler dependent (Fair) | HW Warp-Scheduler dependent (Fair) | Good | Perfect Non-zeros quality | Perfect input and output | Medium | | Storage Requirement | Input Size (for scan) | Input Size (for scan) | Input Size (for scan) | Unknown | Unknown | Unknown | Unknown |

clang-format on

Todo:

somehow make that gist part of the in-code comments.

Values:

enumerator thread_mapped#

1 element per thread

enumerator warp_mapped#

(wip) Equal # of elements per warp

enumerator block_mapped#

Equal # of elements per block.

enumerator bucketing#

(wip) Davidson et al. (SSSP)

enumerator merge_path#

Merrill & Garland (SpMV):: ModernGPU.

enumerator merge_path_v2#

Merrill & Garland (SpMV):: CUSTOM.

enumerator work_stealing#

(wip)

enum gunrock::operators::advance_io_type_t#

Type of the input and output for advance. E.g. none imples that there will be no output for the advance.

Values:

enumerator graph#

Entire graph as an input frontier.

enumerator vertices#

Vertex input or output frontier.

enumerator edges#

Edge input or output frontier.

enumerator none#

No output frontier.

enum gunrock::operators::advance_direction_t#

Direction of the advance operator. Forward is push-based, backwards is pull-based, optimized is both forwards and backwards controlled by a threshold.

Values:

enumerator forward#

Push-based approach.

enumerator backward#

Pull-based approach.

enumerator optimized#

Push-pull optimized.

Filter Operator#

template<filter_algorithm_t alg_type, typename graph_t, typename operator_t, typename frontier_t>
void gunrock::operators::filter::execute(graph_t &G, operator_t op, frontier_t *input, frontier_t *output, gcuda::multi_context_t &context)#

A filter operator generates a new frontier from an input frontier by removing elements in the frontier that do not satisfy a predicate.

auto sample = [=] __host__ __device__(
  vertex_t const& vertex) -> bool {
  return (vertex % 2 == 0) ? true : false; // keep even vertices
};

// Execute filter operator on the provided lambda
operators::filter::execute<operators::filter_algorithm_t::bypass>(
  G, sample, input_frontier, output_frontier, context);
Overview

A frontier consists of vertices or edges, a filter applied to an incoming frontier will generate a new frontier by removing the elements that do not satisfy a user-defined predicate. The predicate function takes a vertex or edge and returns a boolean value. If the boolean value is true, the element is kept in the new frontier. If the boolean value is false, the element is removed from the new frontier.

Example

The following code is a simple snippet on how to use filter operator with input and output frontiers.

Template Parameters:
  • alg_type – The filter algorithm to use.

  • graph_t – The graph type.

  • operator_t – The operator type.

  • frontier_t – The frontier type.

Parameters:
  • G – Input graph used.

  • op – Predicate function, can be defined using a C++ lambda function.

  • input – Input frontier.

  • output – Output frontier (some algorithms may not use this, and allow for in-place filter operation).

  • context – a gcuda::multi_context_t that contains GPU contexts for the available CUDA devices. Used to launch the filter kernels.

template<filter_algorithm_t alg_type, typename graph_t, typename enactor_type, typename operator_t>
void gunrock::operators::filter::execute(graph_t &G, enactor_type *E, operator_t op, gcuda::multi_context_t &context, bool swap_buffers = true)#

A filter operator generates a new frontier from an input frontier by removing elements in the frontier that do not satisfy a predicate.

auto sample = [=] __host__ __device__(
  vertex_t const& vertex) -> bool {
  return (vertex % 2 == 0) ? true : false; // keep even vertices
};

// Execute filter operator on the provided lambda
operators::filter::execute<operators::filter_algorithm_t::bypass>(
  G, E, sample, context);
Overview

A frontier consists of vertices or edges, a filter applied to an incoming frontier will generate a new frontier by removing the elements that do not satisfy a user-defined predicate. The predicate function takes a vertex or edge and returns a boolean value. If the boolean value is true, the element is kept in the new frontier. If the boolean value is false, the element is removed from the new frontier.

Example

The following code is a simple snippet on how to use filter within the enactor loop instead of explicit input and output frontiers. The enactor interface automatically swap the input and output after each iteration.

Template Parameters:
  • alg_type – The filter algorithm to use.

  • graph_t – The graph type.

  • enactor_type – The enactor type.

  • operator_t – The operator type.

  • frontier_t – The frontier type.

Parameters:
  • G – Input graph used.

  • op – Predicate function, can be defined using a C++ lambda function.

  • E – Enactor struct containing input and output frontiers.

  • context – a gcuda::multi_context_t that contains GPU contexts for the available CUDA devices. Used to launch the filter kernels.

enum gunrock::operators::filter_algorithm_t#

Underlying filter algorithm to use.

Overview

Each of the following algorithm options may have certain overhead versus quality and storage requirements of the filtering process. For e.g. the cheapest algorithm bypass, where we simply mark the frontier item as invalid instead of explicitly removing it from the frontier.

Values:

enumerator remove#

Remove if predicate = true.

enumerator predicated#

Copy if predicate = true.

enumerator compact#

2-Pass Transform compact

enumerator bypass#

Marks as invalid, instead of culling.

Batch Operator#

template<typename function_t, typename ...args_t>
void gunrock::operators::batch::execute(function_t f, std::size_t number_of_jobs, float *total_elapsed, args_t&... args)#

Batch operator takes a function and executes it in parallel till the desired number of jobs. The function is executed from the CPU using C++ std::threads.

// Lambda function to be executed in separate std::threads.
auto f = [&](std::size_t job_idx) -> float {
  // Function to run in batches, this can be an entire application
  // running on the GPU with different inputs (such as job_idx as a vertex).
  return run_function(G, (vertex_t)job_idx); // ... run another function.
};

// Execute the batch operator on the provided lambda function.
operators::batch::execute(f, 10, total_elapsed.data());
Overview

The batch operator takes a function and executes it in parallel for number of jobs count. This is a very rudimentary implementation of the batch operator, we can expand this further by calculating the available GPU resources, and also possibly using this to parallelize the batches onto multiple GPUs.

Example

The following code snippet is a simple snippet on how to use the batch operator.

Template Parameters:
  • function_t – The function type.

  • args_t – type of the arguments to the function.

Parameters:
  • f – The function to execute.

  • number_of_jobs – Number of jobs to execute.

  • total_elapsed – pointer to an array of size 1, that stores the time taken to execute all the batches.

  • args – variadic arguments to be passed to the function (wip).

Parallel-For Operator#

template<parallel_for_each_t type, typename func_t, typename frontier_t>
std::enable_if_t<type == parallel_for_each_t::element> gunrock::operators::parallel_for::execute(frontier_t &f, func_t op, gcuda::multi_context_t &context)#

For each element in the frontier, apply a user-defined function.

See also

gcuda::multi_context_t).

Template Parameters:
Parameters:
  • f – Frontiers to apply user-defined function to.

  • op – User-defined function.

  • context – Device context (

Returns:

bool ignore the output, limitation of __device__ lambda functions require a template parameter to be named (see neoblizz/enable_if_bug).

template<parallel_for_each_t type, typename func_t, typename graph_t>
std::enable_if_t<type != parallel_for_each_t::element> gunrock::operators::parallel_for::execute(graph_t &G, func_t op, gcuda::multi_context_t &context)#

For each vertex, edge or edge weight in the graph, apply a user-defined function.

See also

gcuda::multi_context_t).

Template Parameters:
Parameters:
  • G – Graph to apply user-defined function to.

  • op – User-defined function.

  • context – Device context (

Returns:

bool ignore the output, limitation of __device__ lambda functions require a template parameter to be named (see neoblizz/enable_if_bug).

enum gunrock::operators::parallel_for_each_t#

Values:

enumerator vertex#

for each vertex in the graph

enumerator edge#

for each edge in the graph

enumerator weight#

for each weight in the graph

enumerator element#

for each element in the frontier

Neighbor Reduce Operator#

template<advance_io_type_t input_t = advance_io_type_t::graph, typename graph_t, typename enactor_t, typename output_t, typename operator_t, typename arithmetic_t>
void gunrock::operators::neighborreduce::execute(graph_t &G, enactor_t *E, output_t *output, operator_t op, arithmetic_t arithmetic_op, output_t init_value, gcuda::multi_context_t &context)#

Neighbor reduce is an operator that performs reduction on the segments of neighbors (or data associated with the neighbors), where each segment is defined by the source vertex. Another simple way to understand this operator is to perform advance and then reduction on the resultant traversal.

See also

gcuda::multi_context_t).

Overview

Neighbor reduce operator, built on top of segmented reduction. This is a very limited approach to neighbor reduce, and only gives you the edge per advance. It’s only implemented on the entire graph (frontiers not yet supported).

Template Parameters:
  • input_t – advance input type (advance_io_type_t::graph supported)

  • graph_t – graph type.

  • enactor_t – enactor type.

  • output_t – output type.

  • operator_t – user-defined lambda function.

  • arithmetic_t – binary function, arithmetic operator such as sum, max, min, etc.

Parameters:
  • G – graph to perform advance-reduce on.

  • E – enactor structure (not used as of right now).

  • output – output buffer.

  • op – user-defined lambda function.

  • arithmetic_op – arithmetic operator (binary).

  • init_value – initial value for the reduction.

  • context – cuda context (

Uniquify Operator#

template<uniquify_algorithm_t type, typename frontier_t>
void gunrock::operators::uniquify::execute(frontier_t *input, frontier_t *output, gcuda::multi_context_t &context, bool best_effort_uniquification = false, const float uniquification_percent = 100)#
template<uniquify_algorithm_t type = uniquify_algorithm_t::unique, typename enactor_type>
void gunrock::operators::uniquify::execute(enactor_type *E, gcuda::multi_context_t &context, bool best_effort_uniquification = false, const float uniquification_percent = 100, bool swap_buffers = true)#
enum gunrock::operators::uniquify_algorithm_t#

Values:

enumerator unique#

Keep only the unique item for each consecutive group. Sort for 100% uniqueness.

enumerator unique_copy#

Copy the unique items for each consecutive group. Sort for 100% uniqueness.