Throughput vs. Frontier Size
While running BFS on several datasets, we extracted the MTEPS (throughput) that we achieve on each iteration. What is our performance as a function of the size of the input and output frontier?
Ideally, throughput is independent of the size of the input or output frontier, but in practice, the GPU requires a significant amount of work to become fully utilized. So for relatively small frontiers, as the size of the frontier increases, so does the performance. Eventually we reach maximum throughput. This is a significant reason why road-network-ish graphs demonstrate a much smaller throughput (MTEPS) than scale-free graphs; road networks just don't generate frontiers that are large enough to fully utilize the GPU.
It appears from the results that the output frontier size is a better predictor of performance than the input frontier size.
Source data, with links to the output JSON for each run