Developers
This is a developer's guide for Gunrock, split into three major sections based on the depth of the work:
- Interface Developer's Guide
- Primitive/Application/Algorithm Developer's Guide
- Operator/Core Developer's Guide
Directory Structure
Regardless of the level of contribution you would like to make, or develop towards, it is essential to understand the source directory structure of Gunrock. Knowing where things are will help decide the best place to invest your time in, and make the changes needed for your particular problem.
[gunrock's root] - .github -- GitHub related files and folders, such as issue and pull request templates can be found here.
cmake -- CMake modules are located here, allowing us to download googletest source code on the fly during cmake build, and other modules to look for the respective libraries and prerequisites.
dataset -- Important gunrock datasets used in the publications, performance testing, correctness checking and unit tests are located here.
- large --
Large datasets are not stored in the github repository, we instead provide a Makefile to download the required dataset (or the entire collection) by simply running
make
in the respective subdirectory (or large directory for the whole collection).- [dataset]
- small --
Generally less than a 100 nodes (or a few 100s of nodes), these datasets are good candidates for correctness checking and quick tests. If you are developing a primitive within gunrock, it is recommended to use these for testing purposes and conduct a much more comprehensive tests/correctness check using the dataset under
large
directory.
- large --
Large datasets are not stored in the github repository, we instead provide a Makefile to download the required dataset (or the entire collection) by simply running
docker -- Docker files for different configurations, simply run
docker build . -t gunrock
in one of the subdirectories, and start the docker session usingnvidia-docker run -it gunrock /bin/bash
.docs -- Documentation repository (https://github.com/gunrock/docs) used for the main gunrock's website available at https://gunrock.github.io/docs/
doxygen -- Doxygen generated API docs.
examples -- Application test driver are located here in the subdirectory of the respective application. The
main()
function in eachtest_
file in the subdirectories is responsible for setting the templated types and calling the CPU reference implementation as well as the Gunrock GPU implementation for the respective application..cu - [example application]
externals -- External modules used by gunrock, perform
git submodule init && git submodule update
to fetch the modules; these include moderngpu, CUB and rapidJSON.gunrock -- Core of gunrock's source code is located here, this includes implementation of different algorithms/primitives, graph data structure, graph loader, operators, partitioner, other useful utlities, etc.
app -- Implementations of each applications are located here (the driver is located under [root]/examples/[application name]).
- hello -- A sample application used to guide developers through writing their own application within gunrock.
- hello_app.cu : High level view of the application, used to call the problem initialization and reset, enactor initialization and reset, and the validation routine. The external c-interface for the respective application is also located within the
file._app.cu - hello_test.cuh : CPU reference code, validation routine and a display/output function are all located in the
file. Mainly used for correctness checking of the parallel implementation within gunrock and gunrock's programming model._test.cuh - hello_problem.cuh : An application's data goes here. Declaration, initialization, reset and extraction of the data happens in the
file._problem.cuh - hello_enactor.cuh : Iteration loop, operators and the actual algorithm implementation goes in the
file._enactor.cuh
graph -- Supported graph data structures and their helper functions can be found here. An example of such data structure is the Compressed Sparse Row format (CSR), and an example of a helper function for CSR format will be
GetNeighborListLength()
, which returns the number of neighbors for a given node.graphio -- Graph loader, labels loader, matrix market reader, randomly generated graphs (rgg) generator, etc.
oprtr -- Gunrock's operators, such as advance (visiting neighbors), filter (removing nodes), for, neighborreduce, intersection are all located here in their respective folders. Load-balancing for these operators are also done within this directory as part of the operator kernels. See
oprtr_parameters.cuh
for the available parameters for the operators, andoprtr.cuh
for the API for these operators.partitioner -- Graph partitioners such as metis, random, biased random partitioner are located here.
util -- Extensive list of supporting utilities for gunrock. Device intrinsic primitives, array utilities, scan, reduction, and a lot more!
python -- Simple Python bindings to the application's C Interfaces are located here as examples that can be built on. Feel free to create a python interface for an application that you would need.
scripts -- Scripts for published papers, presentations and a for general performance testing are all located in the subdirectories.
tools -- Useful scripts and tools to convert other graph file formats, such as snap or gr to matrix market format (mtx).
unittests -- Googletest based unit testing framework for gunrock.
- [unittest]
Git Forking Workflow
Transitioning over from Git Branching Workflow suggested by Vincent Driessen at nvie to Git Forking Workflow for Gunrock.
How Forking Workflow Works?
As in the other Git workflows, the Forking Workflow begins with an official public repository stored on a server. But when a new developer wants to start working on the project, they do not directly clone the official repository.
Instead, they fork the official repository to create a copy of it on the server. This new copy serves as their personal public repository—no other developers are allowed to push to it, but they can pull changes from it (we’ll see why this is important in a moment). After they have created their server-side copy, the developer performs a git clone to get a copy of it onto their local machine. This serves as their private development environment, just like in the other workflows.
When they're ready to publish a local commit, they push the commit to their own public repository—not the official one. Then, they file a pull request with the main repository, which lets the project maintainer know that an update is ready to be integrated. The pull request also serves as a convenient discussion thread if there are issues with the contributed code.
To integrate the feature into the official codebase, the maintainer pulls the contributor’s changes into their local repository, checks to make sure it doesn’t break the project, merges it into his local master branch, then pushes the master branch to the official repository on the server. The contribution is now part of the project, and other developers should pull from the official repository to synchronize their local repositories.
Gunrock's Forking Workflow
gunrock/gunrock:
- Master Branch: Reserved only for final releases or some bug fixes/patched codes.
- Dev Branch: Current working branch where all developers push their changes to. This dev branch will serve as the "next release" gunrock, eliminating the need of managing individual branches for each feature and merging them when it is time for the release.
(personal-fork)/gunrock
- Feature Branch: This is the developer's personal repository with their feature branch. Whatever changes they would like to contribute to gunrock must be in their own personal fork. And once it is time to create a pull request, it is done so using github pull request, a reviewer checks it and the changes are merged into gunrock/gunrock dev branch.
Note that transitioning to this type of workflow from branching model doesn't require much effort, we will just have to start working on our forks and start creating pull requests to one dev branch.
How to contribute?
- Fork using GitHub; https://github.com/gunrock/gunrock
git clone --recursive https://github.com/gunrock/gunrock.git
git remote set-url --push origin https://github.com/username/gunrock.git
This insures that you are pulling fromgunrock/gunrock
(staying updated with the main repository) but pushing to your own forkusername/gunrock
.git add
git commit -m "Describe your changes."
git push
- Once you've pushed the changes on your fork, you can create a pull request on Github to merge the changes.
- Pull request will then be reviewed and merged into the
dev
branch.
GoogleTest for Gunrock
Recommended Read: Introduction: Why Google C++ Testing Framework?
When writing a good test, we would like to cover all possible functions (or execute all code lines), what I will recommend to do is write a simple test, run code coverage on it, and use codecov.io to determine what lines are not executed. This gives you a good idea of what needs to be in the test and what you are missing.
What is code coverage?
Code coverage is a measurement used to express which lines of code were executed by a test suite. We use three primary terms to describe each lines executed.
- hit indicates that the source code was executed by the test suite.
- partial indicates that the source code was not fully executed by the test suite; there are remaining branches that were not executed.
- miss indicates that the source code was not executed by the test suite.
Coverage is the ratio of hits / (hit + partial + miss). A code base that has 5 lines executed by tests out of 12 total lines will receive a coverage ratio of 41% (rounding down).
Below is an example of what lines are a hit and a miss; you can target the lines missed in the tests to improve coverage.
Example Test Using GoogleTest
/**
* @brief BFS test for shared library advanced interface
* @file test_lib_bfs.h
*/
// Includes required for the test
#include "stdio.h"
#include "gunrock/gunrock.h"
#include "gmock/gmock.h"
#include "gtest/gtest.h"
// Add to gunrock's namespace
namespace gunrock {
/* Test function, test suite in this case is
* sharedlibrary and the test itself is breadthfirstsearch
*/
TEST(sharedlibrary, breadthfirstsearch)
{
struct GRTypes data_t; // data type structure
data_t.VTXID_TYPE = VTXID_INT; // vertex identifier
data_t.SIZET_TYPE = SIZET_INT; // graph size type
data_t.VALUE_TYPE = VALUE_INT; // attributes type
int srcs[3] = {0,1,2};
struct GRSetup *config = InitSetup(3, srcs); // gunrock configurations
int num_nodes = 7, num_edges = 15; // number of nodes and edges
int row_offsets[8] = {0, 3, 6, 9, 11, 14, 15, 15};
int col_indices[15] = {1, 2, 3, 0, 2, 4, 3, 4, 5, 5, 6, 2, 5, 6, 6};
struct GRGraph *grapho = (struct GRGraph*)malloc(sizeof(struct GRGraph));
struct GRGraph *graphi = (struct GRGraph*)malloc(sizeof(struct GRGraph));
graphi->num_nodes = num_nodes;
graphi->num_edges = num_edges;
graphi->row_offsets = (void*)&row_offsets[0];
graphi->col_indices = (void*)&col_indices[0];
gunrock_bfs(grapho, graphi, config, data_t);
int *labels = (int*)malloc(sizeof(int) * graphi->num_nodes);
labels = (int*)grapho->node_value1;
// IMPORTANT: Expected output is stored in an array to compare against determining if the test passed or failed
int result[7] = {2147483647, 2147483647, 0, 1, 1, 1, 2};
for (int i = 0; i < graphi->num_nodes; ++i) {
// IMPORTANT: Compare expected result with the generated labels
EXPECT_EQ(labels[i], result[i]) << "Vectors x and y differ at index " << i;
}
if (graphi) free(graphi);
if (grapho) free(grapho);
if (labels) free(labels);
}
} // namespace gunrock
Create a
test_
file and place it in the appropriate directory inside.h /path/to/gunrock/tests/
. I will be usingtest_bfs_lib.h
as an example.In the
tests/test.cpp
file, add your test file as an include:#include "bfs/test_lib_bfs.h"
.In your
test_
file, create a.h TEST()
function, which takes two parameters:TEST(
., ) Use
EXPECT
andASSERT
to write the actual test itself. I have provided a commented example below:Now when you run the binary called
unit_test
, it will automatically run your test suite along with all other google tests as well. This binary it automatically compiled when gunrock is built, and is found in/path/to/builddir/bin/unit_test
.
Final Remarks:
- I highly recommend reading the Primer document mentioned at the start of this tutorial. It explains in detail how to write a unit test using google test. My tutorial has more been about how to incorporate it into Gunrock.
- Another interesting read is Measuring Coverage at Google.
From Gunrock to JSON
How do we export information from Gunrock?
Typical programs use "printf" to emit a bunch of unstructured information. As the program gets more sophisticated, "printf" is augmented with command-line switches, perhaps a configuration file, but it's hard to easily parse random printf output.
More structured is JSON format. JSON is a nested dict (hash) data structure with arbitrary keys (and arbitrary nesting). It can be used to hold scalar, vector, and key-value data. Many tools can input and output JSON. It is a good choice for exporting information from a Gunrock program.
Ideally, we would declare a C++ struct or class and simply print it to stdout. The particular issue with C++, however, is that it poorly supports introspection: a running C++ executable does not know anything about the internals of the program that created it. Specifically, it doesn't know its own variable names, at least not without an incredible amount of pain. Maintaining a set of strings that map to variable names is undesirable since that can get out of sync.
Instead, we've elected to use a dict data structure that stores the JSON data, and we will write directly into it. We are using a header-only JSON generator based on Boost Spirit. It's used like this:
json_spirit::mObject info; info["engine"] = "Gunrock";
Currently we can output JSON data in one of three ways, controlled from the command line:
--json
writes the JSON structure to stdout.--jsonfile=filename
writes the JSON structure to filename.--jsondir=dir
writes the JSON structure to an automatically-uniquely-named file in the dir directory. This is the preferred option, since presumably there is a single directory where all JSON output is stored, and all Gunrock runs can use the same--jsondir=dir
command-line option.
The current "automatically-uniquely-named file" producer creates name_dataset_time.json
. By design, the file name should not matter, so long as it is unique (and thus doesn't stomp on other files in the same directory when it's written). No program or person should rely on the contents of file names.
The current JSON structure (info
) is passed by reference between various routines. Yuechao suggests that putting info
into the global Test_Parameter
is a better idea, where it can be passed into the enactor's and problem's Init()
routines.
We don't have a fixed schema (yet), so what's below reflects what we put into the test_bfs code. Some of these are likely not useful for any analysis, but it's preferable to include too much info in the JSON output rather than not enough.
Fields that should be in any Gunrock run
- avg_duty. Average kernel running duty, calculated by kernel run time / kernel lifetime.
- command_line. Reflects how the program was run. Use
args.GetEntireCommandLine()
. - dataset. Short name of dataset, likely the stem of
foo.mtx
(in this case,foo
). Examples:ak2010
orsoc_liveJournal1
. Important to standardize dataset names to allow joins when compiling stats. - elapsed. Elapsed runtime, e.g.,
0.25508800148963928
. Measured in ms. - engine.
Gunrock
for Gunrock runs. git_commit_sha1. The git commit identifier, e.g.,
6f775b82359c3c8529b8878cdae80e9dfbaf5330
. (Strategy from StackOverflow, of course.) Settable by#include
info["git_commit_sha1"] = g_GIT_SHA1; gpuinfo. What GPU did we use? Use:
#include
Gpuinfo gpuinfo; info["gpuinfo"] = gpuinfo.getGpuinfo(); gunrock_version. Reflects what's set in the top-level CMakeFiles; set as a constant "#define" using a compiler flag
-D
during compilation. (Since gunrock_version should not change often, this "#define" is acceptable; we only need rebuild when we update the version.) Example:0.1.0
.#define STR(x) #x #define XSTR(x) STR(x) info["gunrock_version"] = XSTR(GUNROCKVERSION);
iterations. The number of times we run the test, with runtime averaged over all runs. Yuduo suggests considering better names, e.g.:
- search_depth. number of BSP super-steps
- max_iter. Maximum allowed BSP super-steps, breaking after reaching this value
- iterations. Number of runs of a primitive
mark_predecessors. Set by command-line flag; true or false. Can be set for any primitive, as long as when a primitive does advance, it keeps track of predecessors.
name. Name of primitive; I reused the
Stats
nameGPU BFS
.BFS
is probably more appropriate. We definitely need canonical names here.num_gpus. Integer. I think this only works with 1 now. We'll extend to multi-GPU later.
quick: Set by command-line flag; true or false. I don't know what this is.
sysinfo. What GPU/system did we use? Use:
#include
Sysinfo sysinfo; info["sysinfo"] = sysinfo.getSysinfo(); time. When the test was run. This is a standard format from
ctime()
, e.g.Wed Jul 22 09:04:05 2015\n
. (Oddly, the\n
is part of the format.)time_t now = time(NULL); info["time"] = ctime(&now);
userinfo. Who ran this test? Use:
#include
Userinfo userinfo; info["userinfo"] = userinfo.getUserinfo(); verbose: Set by command-line flag; true or false. Presumably relates to logging output.
Fields for any traversal-based primitive
- edges_visited. Self-explanatory.
- m_temps. Millions of edge-traversals per second, e.g.,
0.40378222181564849
. - nodes_visited. Self-explanatory.
redundant_work. Percentage indicating concurrent discovery calculated by:
[(total edges we put into frontier - edge visited) / edge visited] * 100
. Actual code:// measure duplicate edges put through queue redundant_work = ((double)total_queued - edges_visited) / edges_visited;
total_queued. Total elements put into the frontier.
BFS-specific fields
- impotence: Set by command-line flag; true or false.
- instrumented: Set by command-line flag; true or false.
- iterations. Not entirely sure what this indicates. Example:
10
. - max_grid_size. Number of grids for GPU kernels, but sometimes the enactor itself will calculate a optimal number and ignore this setting.
- max_queue_sizing. Used to calculate the initial size of frontiers (#vertices * queue-sizing, or #edges * queue-sizing). --in-sizing is for similar purpose, but for those buffers used by GPU-GPU communication.
- search_depth. Presumably maximum distance from the source found in the graph. Integer.
- traversal_mode. Switch for advance kernels. 0: load-balanced partitioning (Davidson); 1: Merrill's load-balance strategy. Default is currently dynamically choosing between the two.
- undirected. Is the graph is undirected (true) or directed (false)? Command-line option.
- vertex_id. Starting vertex ID. Integer.
Thread safety: "Using JSON Spirit with Multiple Threads"
"If you intend to use JSON Spirit in more than one thread, you will need to uncomment the following line near the top of json_spirit_reader.cpp.
"//#define BOOST_SPIRIT_THREADSAFE"
"In this case, Boost Spirit will require you to link against Boost Threads."
If compilation is too slow
Currently we're using the header-only version of JSON Spirit, which is easier to integrate but requires more compilation. The (docs)[http://www.codeproject.com/KB/recipes/JSON_Spirit.aspx#reduc] have ways to increase compilation speed.
Setup and Use gunrock/io
gunrock/io can be used to generate visual representation of graph engine performance, for exmaple, Gunrock. It takes output of a graph algorithm run and can produce visual output in svg, pdf, png, html and md format.
Grunrock/io Dependencies
To use gunrock/io to produce visual output of any graph algorithm, (as of Dec.2016), below are dependencies overview:
- python 2.7.6
- gcc 4.8.4
- nodejs/node
- npm
- pandas 0.19.2
- numpy 1.11.3
- altair 1.2.0
- vega 2.6.3
- vega-lite 1.3.1
- inkscape 0.91
Below are the instructions to to install dependencies,
Assume the machine has the following env setup:
- OS: Ubuntu 14.04
- Python 2.7.x
- GCC 4.8.x
Nodejs and Npm
First check if node and npm have been installed:
On command line type: node -v npm -v If there is node and npm version output, move on to install altair
Install nodejs and npm on root:
sudo apt-get install libcairo2-dev libjpeg8-dev libpango1.0-dev libgif-dev build-essential g++ sudo apt-get install nodejs sudo apt-get install npm #Create a symbolic link for node, as many Node.js tools use this name to execute. sudo ln -s /usr/bin/nodejs /usr/bin/node
Install altair
sudo pip install altair
If no root access, use following command:
pip install --user altair vim ~/.bashrc HOME=/home/user PYTHONPATH=$HOME/.local/lib/python2.7/site-packages source ~/.bashrc
More altair depencies to save figures
npm install -g vega@2.6.3 vega-lite #"-g" option install npm/node_modules in /usr/local/bin npm -g bin #returns directory of installed binary files ls [returned directory] #check if {vg2png vg2svg vl2png vl2svg vl2vg} exist in [returned directory]
If no root access, use following command:
npm install vega@2.6.3 vega-lite npm bin ls [returned directory] #npm install /node_modules in current directory #check if {vg2png vg2svg vl2png vl2svg vl2vg} exist in /bin or /.bin #Open .bashrc add: NPM_PACKAGES=/where/node_modules/folder/is/ PATH=$NPM_PACKAGES/.bin:$PATH source ~/.bashrc
More dependencies to save figure as pdf: inkscape
sudo add-apt-repository ppa:inkscape.dev/stable sudo apt-get update sudo apt-get install inkscape
How to use gunrock/io
With all the dependencies installed, to use gunrock/io, below is a guide of how to reproduce the performance figures from JSON in gunrock/io:
Parses the engine outputs (in txt format) and generates jsons containing important information regarding the output results using text2json.py. (Instructions @ README)
Make a folder for output visual representation files.
One can use exsiting scripts to generate different visualization output from JSON files. For example, altair_engines.py generates performance comparison visualization from different graph engines. Below is an example makefile to generate different engines performance comparison figures into .md file into gunrock/doc:
ENGINES_OUTPUTS = output/engines_topc.md \
output/engines_topc_table_html.md
PLOTTING_FILES = fileops.py filters.py logic.py
DEST = "../../gunrock/doc/stats"
ALL = $(ENGINES_OUTPUTS) \
all: $(ALL)
$(ENGINES_OUTPUTS): altair_engines.py $(PLOTTING_FILES)
./altair_engines.py
install: $(ALL)
cp $(ALL) $(DEST)
clean:
rm $(ALL)
After running these commands, output .md files will be copied into gunrock/doc/stats, in the output directory made in step 2, there will also be .html, .svg, .png, .pdf, .eps and .json output files generated. To start a new python scripts that will output other visualization output, please follow (script @ altair_engines.py).
Reference:
- https://altair-viz.github.io/documentation/displaying.html
- http://wiki.inkscape.org/wiki/index.php/Installing_Inkscape
- https://altair-viz.github.io/installation.html
- http://www.hostingadvice.com/how-to/install-nodejs-ubuntu-14-04/