… currently duplicated from my old site
Fast Sorting for Exact OIT of Complex Scenes - 2014
Pyarelal Knowles, Geoff Leach and Fabio Zambetta
A paper in Computer Graphics International 2014, and a special issue of The Visual Computer.
Exact order independent transparency (OIT) techniques capture all fragments during rasterization. The fragments are then sorted per-pixel by depth and then composite them in order using alpha transparency. The sorting stage is a bottleneck for high depth complexity scenes, taking 70–95% of the total time for those investigated. In this paper we show that typical shader based sorting speed is impacted by local memory latency and occupancy. We present and discuss the use of both registers and an external merge sort in register-based block sort to better use the memory hierarchy of the GPU for improved OIT rendering performance. This approach builds upon backwards memory allocation, achieving an OIT rendering speed up to 1.7× that of the best previous method and 6.3× that of the common straight forward OIT implementation. In some cases the sorting stage is reduced to no longer be the dominant OIT component.
A pre-print is available here: rbs-preprint.pdf
The final publication is available at Springer via: link.springer.com
I’ll also provide code at some point. Time is pretty tight right now. Though feel free to email me.
@article{knowles2014rbs,
author={Knowles, Pyarelal and Leach, Geoff and Zambetta, Fabio},
title={Fast sorting for exact OIT of complex scenes},
journal={The Visual Computer},
year={2014},
issn={0178-2789},
volume={30},
number={6-8},
doi={10.1007/s00371-014-0956-z},
url={http://dx.doi.org/10.1007/s00371-014-0956-z},
publisher={Springer Berlin Heidelberg},
keywords={Sorting; OIT; Transparency; Shaders; Performance; registers; Register-based block sort},
pages={603-613},
language={English}
}
Backwards Memory Allocation and Improved OIT - 2013
Pyarelal Knowles, Geoff Leach and Fabio Zambetta
A short paper in Pacific Graphics 2013.
Order independent transparency (OIT) is a graphics technique which sorts surfaces per-pixel for correct alpha blending. The sorting stage requires relatively large amounts of temporary memory in shaders that is usually conservatively allocated at a maximum, which impacts occupancy and performance. To address this issue we introduce backwards memory allocation (BMA), a strategy which creates a set of shaders with varying static allocation size in lieu of dynamic allocation. Batches of threads are then executed directly with the appropriate shader. This also allows optimizations for each generated shader such as choosing the sorting algorithm based on allocation size with no additional overhead. BMA gives both a more flexible OIT (BMA-OIT) for dynamic scenes of varying depth complexity and up to a 3× speedup.
A pre-print is available here: bma-preprint.pdf.
The definitive version is available at: https://diglib.eg.org/.
@inproceedings{knowles2013bma,
crossref = {PG2013short-proc},
author = {Pyarelal Knowles and Geoff Leach and Fabio Zambetta},
title = {{Backwards Memory Allocation and Improved OIT}},
booktitle = {Proceedings of Pacific Graphics 2013 (short papers)},
pages = {59-64},
location = {Singapore},
month = {October},
year = {2013},
URL = {http://diglib.eg.org/EG/DL/PE/PG/PG2013short/059-064.pdf},
DOI = {10.2312/PE.PG.PG2013short.059-064}
}
Efficient Layered Fragment Buffer Techniques - 2012
Pyarelal Knowles, Geoff Leach and Fabio Zambetta
A book chapter in OpenGL Insights.
Rasterization typically resolves visible surfaces using the depth buffer, computing just the front-most layer of fragments. However, some applications require all fragment data, including that of hidden surfaces. In this chapter, we refer to this data and the technique to compute it as a layered fragment buffer (LFB).
The chapter is comparison of then-current order independent ransparency techniques which captured all fragments in a single rendering pass — 3D array, linked list and linearized array LFB techniques. There is a focus on the linearized method which packs the fragment data using a prefix sum scan. A discussion of the implementation and performance results is included.
The code sample can be found in the book’s github repository.
@InCollection{knowles2012lfb,
author = {Pyarelal Knowles and Geoff Leach and Fabio Zambetta},
title = {Efficient Layered Fragment Buffer Techniques},
booktitle = {{O}pen{GL} {I}nsights},
pages = {279-292},
editor = {Patrick Cozzi and Christophe Riccio},
month = {July},
year = {2012},
isbn = {978-1439893760},
publisher = {CRC Press},
note = {\url{http://www.openglinsights.com/}}
}