Speedup on multiple GPUs
We demonstrate multiple-GPU ("mGPU") speedup over 1GPU performance for six primitives: BC, BFS, CC, DOBFS, PR and SSSP. We show both speedups on individual datasets and geometric means of runtime speedup over all datasets. These experiments are on NVIDIA Tesla K40m GPUs.
Most of the primitives scale fairly well from 1 to 6 GPUs. The exception is DOBFS, which (unlike the other experiments) is primarily limited by communication overhead.
Source data, with links to the output JSON for each run