NAV Navbar
  • DEVELOPERS
  • DIRECTORY STRUCTURE
  • GIT FORKING WORKFLOW
  • GOOGLETEST FOR GUNROCK
  • FROM GUNROCK TO JSON
  • SETUP AND USE GUNROCK/IO
  • Developers

    This is a developer's guide for Gunrock, split into three major sections based on the depth of the work:

    Directory Structure

    Regardless of the level of contribution you would like to make, or develop towards, it is essential to understand the source directory structure of Gunrock. Knowing where things are will help decide the best place to invest your time in, and make the changes needed for your particular problem.

    [gunrock's root] - .github -- GitHub related files and folders, such as issue and pull request templates can be found here.

    Git Forking Workflow

    Transitioning over from Git Branching Workflow suggested by Vincent Driessen at nvie to Git Forking Workflow for Gunrock.

    How Forking Workflow Works?

    Forking Workflow As in the other Git workflows, the Forking Workflow begins with an official public repository stored on a server. But when a new developer wants to start working on the project, they do not directly clone the official repository.

    Instead, they fork the official repository to create a copy of it on the server. This new copy serves as their personal public repository—no other developers are allowed to push to it, but they can pull changes from it (we’ll see why this is important in a moment). After they have created their server-side copy, the developer performs a git clone to get a copy of it onto their local machine. This serves as their private development environment, just like in the other workflows.

    When they're ready to publish a local commit, they push the commit to their own public repository—not the official one. Then, they file a pull request with the main repository, which lets the project maintainer know that an update is ready to be integrated. The pull request also serves as a convenient discussion thread if there are issues with the contributed code.

    To integrate the feature into the official codebase, the maintainer pulls the contributor’s changes into their local repository, checks to make sure it doesn’t break the project, merges it into his local master branch, then pushes the master branch to the official repository on the server. The contribution is now part of the project, and other developers should pull from the official repository to synchronize their local repositories.

    Gunrock's Forking Workflow

    gunrock/gunrock:

    (personal-fork)/gunrock

    Note that transitioning to this type of workflow from branching model doesn't require much effort, we will just have to start working on our forks and start creating pull requests to one dev branch.

    How to contribute?

    GoogleTest for Gunrock

    Recommended Read: Introduction: Why Google C++ Testing Framework?

    When writing a good test, we would like to cover all possible functions (or execute all code lines), what I will recommend to do is write a simple test, run code coverage on it, and use codecov.io to determine what lines are not executed. This gives you a good idea of what needs to be in the test and what you are missing.

    What is code coverage?

    Code coverage is a measurement used to express which lines of code were executed by a test suite. We use three primary terms to describe each lines executed.

    • hit indicates that the source code was executed by the test suite.
    • partial indicates that the source code was not fully executed by the test suite; there are remaining branches that were not executed.
    • miss indicates that the source code was not executed by the test suite.

    Coverage is the ratio of hits / (hit + partial + miss). A code base that has 5 lines executed by tests out of 12 total lines will receive a coverage ratio of 41% (rounding down).

    Below is an example of what lines are a hit and a miss; you can target the lines missed in the tests to improve coverage.

    Example CodeCov Stats

    Example Test Using GoogleTest

    /**
     * @brief BFS test for shared library advanced interface
     * @file test_lib_bfs.h
     */
    
    // Includes required for the test
    
    #include "stdio.h"
    #include "gunrock/gunrock.h"
    #include "gmock/gmock.h"
    #include "gtest/gtest.h"
    
    // Add to gunrock's namespace
    namespace gunrock {
    
    /* Test function, test suite in this case is
     * sharedlibrary and the test itself is breadthfirstsearch
     */
    TEST(sharedlibrary, breadthfirstsearch)
    {
        struct GRTypes data_t;                 // data type structure
        data_t.VTXID_TYPE = VTXID_INT;         // vertex identifier
        data_t.SIZET_TYPE = SIZET_INT;         // graph size type
        data_t.VALUE_TYPE = VALUE_INT;         // attributes type
        int srcs[3] = {0,1,2};
    
        struct GRSetup *config = InitSetup(3, srcs);   // gunrock configurations
    
        int num_nodes = 7, num_edges = 15;  // number of nodes and edges
        int row_offsets[8]  = {0, 3, 6, 9, 11, 14, 15, 15};
        int col_indices[15] = {1, 2, 3, 0, 2, 4, 3, 4, 5, 5, 6, 2, 5, 6, 6};
    
        struct GRGraph *grapho = (struct GRGraph*)malloc(sizeof(struct GRGraph));
        struct GRGraph *graphi = (struct GRGraph*)malloc(sizeof(struct GRGraph));
        graphi->num_nodes   = num_nodes;
        graphi->num_edges   = num_edges;
        graphi->row_offsets = (void*)&row_offsets[0];
        graphi->col_indices = (void*)&col_indices[0];
    
        gunrock_bfs(grapho, graphi, config, data_t);
    
        int *labels = (int*)malloc(sizeof(int) * graphi->num_nodes);
        labels = (int*)grapho->node_value1;
    
        // IMPORTANT: Expected output is stored in an array to compare against determining if the test passed or failed
        int result[7] = {2147483647, 2147483647, 0, 1, 1, 1, 2};
    
        for (int i = 0; i < graphi->num_nodes; ++i) {
          // IMPORTANT: Compare expected result with the generated labels
          EXPECT_EQ(labels[i], result[i]) << "Vectors x and y differ at index " << i;
        }
    
        if (graphi) free(graphi);
        if (grapho) free(grapho);
        if (labels) free(labels);
    
    }
    } // namespace gunrock
    
    1. Create a test_.h file and place it in the appropriate directory inside /path/to/gunrock/tests/. I will be using test_bfs_lib.h as an example.

    2. In the tests/test.cpp file, add your test file as an include: #include "bfs/test_lib_bfs.h".

    3. In your test_.h file, create a TEST() function, which takes two parameters: TEST(, ).

    4. Use EXPECT and ASSERT to write the actual test itself. I have provided a commented example below:

    5. Now when you run the binary called unit_test, it will automatically run your test suite along with all other google tests as well. This binary it automatically compiled when gunrock is built, and is found in /path/to/builddir/bin/unit_test.

    Final Remarks:

    From Gunrock to JSON

    How do we export information from Gunrock?

    Typical programs use "printf" to emit a bunch of unstructured information. As the program gets more sophisticated, "printf" is augmented with command-line switches, perhaps a configuration file, but it's hard to easily parse random printf output.

    More structured is JSON format. JSON is a nested dict (hash) data structure with arbitrary keys (and arbitrary nesting). It can be used to hold scalar, vector, and key-value data. Many tools can input and output JSON. It is a good choice for exporting information from a Gunrock program.

    Ideally, we would declare a C++ struct or class and simply print it to stdout. The particular issue with C++, however, is that it poorly supports introspection: a running C++ executable does not know anything about the internals of the program that created it. Specifically, it doesn't know its own variable names, at least not without an incredible amount of pain. Maintaining a set of strings that map to variable names is undesirable since that can get out of sync.

    Instead, we've elected to use a dict data structure that stores the JSON data, and we will write directly into it. We are using a header-only JSON generator based on Boost Spirit. It's used like this:

    json_spirit::mObject info; info["engine"] = "Gunrock";

    Currently we can output JSON data in one of three ways, controlled from the command line:

    The current "automatically-uniquely-named file" producer creates name_dataset_time.json. By design, the file name should not matter, so long as it is unique (and thus doesn't stomp on other files in the same directory when it's written). No program or person should rely on the contents of file names.

    The current JSON structure (info) is passed by reference between various routines. Yuechao suggests that putting info into the global Test_Parameter is a better idea, where it can be passed into the enactor's and problem's Init() routines.

    We don't have a fixed schema (yet), so what's below reflects what we put into the test_bfs code. Some of these are likely not useful for any analysis, but it's preferable to include too much info in the JSON output rather than not enough.

    Fields that should be in any Gunrock run

    Fields for any traversal-based primitive

    BFS-specific fields

    Thread safety: "Using JSON Spirit with Multiple Threads"

    "If you intend to use JSON Spirit in more than one thread, you will need to uncomment the following line near the top of json_spirit_reader.cpp.

    "//#define BOOST_SPIRIT_THREADSAFE"

    "In this case, Boost Spirit will require you to link against Boost Threads."

    link

    If compilation is too slow

    Currently we're using the header-only version of JSON Spirit, which is easier to integrate but requires more compilation. The (docs)[http://www.codeproject.com/KB/recipes/JSON_Spirit.aspx#reduc] have ways to increase compilation speed.

    Setup and Use gunrock/io

    gunrock/io can be used to generate visual representation of graph engine performance, for exmaple, Gunrock. It takes output of a graph algorithm run and can produce visual output in svg, pdf, png, html and md format.

    Grunrock/io Dependencies

    To use gunrock/io to produce visual output of any graph algorithm, (as of Dec.2016), below are dependencies overview:

    Below are the instructions to to install dependencies,

    Assume the machine has the following env setup:

    Nodejs and Npm

    First check if node and npm have been installed:

    On command line type: node -v npm -v If there is node and npm version output, move on to install altair

    Install nodejs and npm on root:

    sudo apt-get install libcairo2-dev libjpeg8-dev libpango1.0-dev libgif-dev build-essential g++ sudo apt-get install nodejs sudo apt-get install npm #Create a symbolic link for node, as many Node.js tools use this name to execute. sudo ln -s /usr/bin/nodejs /usr/bin/node

    Install altair

    sudo pip install altair

    If no root access, use following command:

    pip install --user altair vim ~/.bashrc HOME=/home/user PYTHONPATH=$HOME/.local/lib/python2.7/site-packages source ~/.bashrc

    More altair depencies to save figures

    npm install -g vega@2.6.3 vega-lite #"-g" option install npm/node_modules in /usr/local/bin npm -g bin #returns directory of installed binary files ls [returned directory] #check if {vg2png vg2svg vl2png vl2svg vl2vg} exist in [returned directory]

    If no root access, use following command:

    npm install vega@2.6.3 vega-lite npm bin ls [returned directory] #npm install /node_modules in current directory #check if {vg2png vg2svg vl2png vl2svg vl2vg} exist in /bin or /.bin #Open .bashrc add: NPM_PACKAGES=/where/node_modules/folder/is/ PATH=$NPM_PACKAGES/.bin:$PATH source ~/.bashrc

    More dependencies to save figure as pdf: inkscape

    sudo add-apt-repository ppa:inkscape.dev/stable sudo apt-get update sudo apt-get install inkscape

    How to use gunrock/io

    With all the dependencies installed, to use gunrock/io, below is a guide of how to reproduce the performance figures from JSON in gunrock/io:

    1. Parses the engine outputs (in txt format) and generates jsons containing important information regarding the output results using text2json.py. (Instructions @ README)

    2. Make a folder for output visual representation files.

    3. One can use exsiting scripts to generate different visualization output from JSON files. For example, altair_engines.py generates performance comparison visualization from different graph engines. Below is an example makefile to generate different engines performance comparison figures into .md file into gunrock/doc:

        ENGINES_OUTPUTS = output/engines_topc.md \
        output/engines_topc_table_html.md
        PLOTTING_FILES = fileops.py filters.py logic.py
        DEST = "../../gunrock/doc/stats"
        ALL = $(ENGINES_OUTPUTS) \
        all: $(ALL)
        $(ENGINES_OUTPUTS): altair_engines.py $(PLOTTING_FILES)
                ./altair_engines.py
        install: $(ALL)
                cp $(ALL) $(DEST)
        clean:
                rm $(ALL)
    

    After running these commands, output .md files will be copied into gunrock/doc/stats, in the output directory made in step 2, there will also be .html, .svg, .png, .pdf, .eps and .json output files generated. To start a new python scripts that will output other visualization output, please follow (script @ altair_engines.py).

    Reference: