tag:blogger.com,1999:blog-72774490279636234522024-02-07T07:12:27.346-08:00Ray Tracey's blogQuasi-random, more or less unbiased blog about real-time photorealistic GPU renderingSam Laperehttp://www.blogger.com/profile/05688552048697970050noreply@blogger.comBlogger346125tag:blogger.com,1999:blog-7277449027963623452.post-18120816639422033292022-09-20T15:13:00.004-07:002022-09-22T21:02:24.859-07:00Nvidia Racer RTX<p style="text-align: justify;">Well this put a smile on my face. Nvidia just announced Racer RTX, a fully real-time raytraced minigame running on their Omniverse platform on a single RTX 4000. It looks quite a few steps up from Marbles RTX, which was already exceptional in itself. The lighting and shading quality is fairly unparalleled for a real-time application. It's amazing to see how quickly this technology has progressed in the last five years and to know that this will be available to everyone (who can afford a next-gen GPU) soon. If only GTA 6 would look as good as this...</p><div class="separator" style="clear: both; text-align: center;"></div><div class="separator" style="clear: both; text-align: center;"><iframe allowfullscreen="" class="BLOG_video_class" height="266" src="https://www.youtube.com/embed/AsykNkUMoNU" width="320" youtube-src-id="AsykNkUMoNU"></iframe></div><p></p><p style="text-align: justify;"><span>A comparison of ray tracing performance on RTX 3000 with DLSS 2.0 vs RTX 4000 with DLSS 3.0:<br /></span></p><p><span></span></p><div class="separator" style="clear: both; text-align: center;"><iframe allowfullscreen="" class="BLOG_video_class" height="266" src="https://www.youtube.com/embed/k9bqaHpFmGo" width="320" youtube-src-id="k9bqaHpFmGo"></iframe></div><br /><br /><p></p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh5hPEfZRAA-1wJHDny2okHFgD2gYln_06YxE3HOohUWLAM6garFg4dnbCyl9hBw-vzDhijRNdBdgIrCjMT1eJ5GY5chCteVvfJg0F-9DxGqax2XD1fCyOA1fXzdW8ivNh9ch_V6XqZqOovt2TvyRKysFy00HYnAmhQHDLM-nTFvsB8e8hzf-BwjoZh/s3840/racer_rtx1.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="2160" data-original-width="3840" height="225" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh5hPEfZRAA-1wJHDny2okHFgD2gYln_06YxE3HOohUWLAM6garFg4dnbCyl9hBw-vzDhijRNdBdgIrCjMT1eJ5GY5chCteVvfJg0F-9DxGqax2XD1fCyOA1fXzdW8ivNh9ch_V6XqZqOovt2TvyRKysFy00HYnAmhQHDLM-nTFvsB8e8hzf-BwjoZh/w400-h225/racer_rtx1.png" width="400" /></a></div><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgPptsL6G-U-5CsCgAcfDUNx0eaBPaYfXRDWaNwWvuUQZiku9SSdKFg8joTV4kTE8VnZgl9qxd6BknR3zbcHWsZ0GelSxwr4I-DUMupJTps7_snfhVT3ZxLT2fFKAZQgnEpziSiatbhd1WqjqBh2t72ccrhk-snUtxHFTDr0H__1y4StWZrTgLd1M-F/s3840/racer_rtx2.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="2160" data-original-width="3840" height="225" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgPptsL6G-U-5CsCgAcfDUNx0eaBPaYfXRDWaNwWvuUQZiku9SSdKFg8joTV4kTE8VnZgl9qxd6BknR3zbcHWsZ0GelSxwr4I-DUMupJTps7_snfhVT3ZxLT2fFKAZQgnEpziSiatbhd1WqjqBh2t72ccrhk-snUtxHFTDr0H__1y4StWZrTgLd1M-F/w400-h225/racer_rtx2.png" width="400" /></a></div><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh_jhNZkM3ZCkFF0ueCPR9tDm0e_ryVlmxpzSyHu_SHyGhjrBFRZVPA4q3HHuSKeD_ZTrB_v-94o1TCPngkQAElkykLd85dysZAsaU0tZ4290HxP3Kyf6JW7xskMnzxE_FkX0iN8omFtD106jkWy8xly0EdnwnOUhGNFYG7BfM8EZWlJDMNBMhecgeO/s3840/racer_rtx3.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="2160" data-original-width="3840" height="225" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh_jhNZkM3ZCkFF0ueCPR9tDm0e_ryVlmxpzSyHu_SHyGhjrBFRZVPA4q3HHuSKeD_ZTrB_v-94o1TCPngkQAElkykLd85dysZAsaU0tZ4290HxP3Kyf6JW7xskMnzxE_FkX0iN8omFtD106jkWy8xly0EdnwnOUhGNFYG7BfM8EZWlJDMNBMhecgeO/w400-h225/racer_rtx3.png" width="400" /></a></div><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgdITydko_w3zOoMf7GAkEf0FfSt842laKpMZ55pdMTRczuyyTacOsgISmWCB69uTYCcSNjeMlbo-LHyGUcXEXm1b2wmJW93tApXHAZUOySKADKw0D7-0u5TTuSw3YooYIMLDvEp2cwYbwiukGqyC2MktCI4cndsq1WAgGFxP07jGImK-CWfRRpUb9I/s3840/racer_rtx4.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="2160" data-original-width="3840" height="225" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgdITydko_w3zOoMf7GAkEf0FfSt842laKpMZ55pdMTRczuyyTacOsgISmWCB69uTYCcSNjeMlbo-LHyGUcXEXm1b2wmJW93tApXHAZUOySKADKw0D7-0u5TTuSw3YooYIMLDvEp2cwYbwiukGqyC2MktCI4cndsq1WAgGFxP07jGImK-CWfRRpUb9I/w400-h225/racer_rtx4.png" width="400" /></a></div><br />Sam Laperehttp://www.blogger.com/profile/05688552048697970050noreply@blogger.com2tag:blogger.com,1999:blog-7277449027963623452.post-59866567125643270822021-08-14T07:16:00.002-07:002021-08-14T07:44:51.277-07:00Nvidia Omniverse renders everything and the kitchen sink<p>Last week at Siggraph, Nvidia released a fascinating making-of documentary of the Nvidia GTC keynote. It contains lots of snippets showing the real-time photorealistic rendering capabilities of Omniverse.<br /></p><p>Short version: <br /></p><div class="separator" style="clear: both; text-align: center;"><iframe allowfullscreen="" class="BLOG_video_class" height="266" src="https://www.youtube.com/embed/f_Lv8BOjs4E" width="320" youtube-src-id="f_Lv8BOjs4E"></iframe></div><div class="separator" style="clear: both; text-align: center;"><br /></div><p></p><p>Extended version:</p><p></p><div class="separator" style="clear: both; text-align: center;"><iframe allowfullscreen="" class="BLOG_video_class" height="266" src="https://www.youtube.com/embed/1qhqZ9ECm70" width="320" youtube-src-id="1qhqZ9ECm70"></iframe></div><br /><p></p><p>The Siggraph talk titled "Realistic digital human rendering with Omniverse RTX Renderer" is also a must-watch for anyone interested in CG humans: </p><p><a href="https://www.nvidia.com/en-us/on-demand/session/siggraph2021-sigg21-s-09/">https://www.nvidia.com/en-us/on-demand/session/siggraph2021-sigg21-s-09/</a></p><p></p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgnHuJjMGqWe09CwzRuIU_15k49D0Fsn1AUZdbtO-tilTPzi8Dj5oocbCfT0279SCFE2nJdbWDERXOYwdGRwvr75nepFmTf2tt8UiSeR_vMb8E1nC4hM7SwWLPi_zZg7tjcSRNZ-FJny84/s1200/digital_human.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="1200" height="200" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgnHuJjMGqWe09CwzRuIU_15k49D0Fsn1AUZdbtO-tilTPzi8Dj5oocbCfT0279SCFE2nJdbWDERXOYwdGRwvr75nepFmTf2tt8UiSeR_vMb8E1nC4hM7SwWLPi_zZg7tjcSRNZ-FJny84/w400-h200/digital_human.jpg" width="400" /></a></div><br /><br /><p></p>Sam Laperehttp://www.blogger.com/profile/05688552048697970050noreply@blogger.com0tag:blogger.com,1999:blog-7277449027963623452.post-3520386649267212682021-08-07T00:44:00.003-07:002021-08-07T00:44:43.047-07:00Ray Tracing Gems II book released for free<p>The best things in life are free they say and that's certainly true for this gem. </p><p>Download it from here: <br /></p><p><a href="http://www.realtimerendering.com/raytracinggems/rtg2/index.html">http://www.realtimerendering.com/raytracinggems/rtg2/index.html</a><br /><br /></p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj2cQQvxsk0hpTHqB5riwj-6pU5iM73Hls0o08_FW0PyHpW5J5C4T8IXoKdwZVJOQzp-3G72lDS0r8iwGHPMAvMPa-sDIbUI7sccz58T-yGoc8O0BoxMhX2sfkiAIWjNm8qmavlCG0fyCk/s967/ray-tracing-gems-ii.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="967" data-original-width="737" height="400" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj2cQQvxsk0hpTHqB5riwj-6pU5iM73Hls0o08_FW0PyHpW5J5C4T8IXoKdwZVJOQzp-3G72lDS0r8iwGHPMAvMPa-sDIbUI7sccz58T-yGoc8O0BoxMhX2sfkiAIWjNm8qmavlCG0fyCk/w305-h400/ray-tracing-gems-ii.png" width="305" /></a></div><br />Sam Laperehttp://www.blogger.com/profile/05688552048697970050noreply@blogger.com0tag:blogger.com,1999:blog-7277449027963623452.post-83462464361109821972020-09-10T14:09:00.036-07:002021-03-01T14:46:07.054-08:00Marbles RTX at night rendered in Nvidia Omniverse
<p>Nvidia showed an improved version of their Marbles RTX demo during the RTX 3000 launch event. What makes this new demo so impressive is that it appears to handle dozens of small lights without breaking a sweat, something which is <a href="https://cs.dartmouth.edu/~wjarosz/publications/bitterli20spatiotemporal.html" target="_blank">notoriously difficult for a path tracer</a>, let alone one of the real-time kind:</p><div class="separator" style="clear: both; text-align: center;"><br /></div><div class="separator" style="clear: both; text-align: center;"><iframe allowfullscreen="" class="BLOG_video_class" height="266" src="https://www.youtube.com/embed/wfZ7NQ0Zc54" width="320" youtube-src-id="wfZ7NQ0Zc54"></iframe></div><br /><p></p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi8qnIXZKpJSklSSyrS2IAeOXISc6y-4ZZ8n84bBqD9eI-mQwruRdVDnTjlcxtqSiNUSGUFhAuX93k9e7Y-8j1PjkOQuXhPKD5DrnrkIDzuZMx4qAUaabWUhYZRhX76B5l216j2P3EbuW8/s1890/Screenshot+from+2020-09-10+23-37-40.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="796" data-original-width="1890" height="216" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi8qnIXZKpJSklSSyrS2IAeOXISc6y-4ZZ8n84bBqD9eI-mQwruRdVDnTjlcxtqSiNUSGUFhAuX93k9e7Y-8j1PjkOQuXhPKD5DrnrkIDzuZMx4qAUaabWUhYZRhX76B5l216j2P3EbuW8/w512-h216/Screenshot+from+2020-09-10+23-37-40.png" title="Hard to believe these marbles are real-time..." width="512" /></a></div><p></p><p>Making of Marbles RTX (really fantastic):</p><div class="separator" style="clear: both; text-align: center;"><iframe allowfullscreen="" class="BLOG_video_class" height="266" src="https://www.youtube.com/embed/hsRcahyojtA" width="320" youtube-src-id="hsRcahyojtA"></iframe></div><br /><p></p><p>The animation is rendered in real-time in Nvidia's Omniverse, a new collaborative platform which features noise-free real-time path tracing and is already turning heads in the CGI industry. Nvidia now also shared the first sneak peek of Omniverse's capabilities: <br /></p><div class="separator" style="clear: both; text-align: center;"><iframe allowfullscreen="" class="BLOG_video_class" height="266" src="https://www.youtube.com/embed/HyZn4YAz89M" width="320" youtube-src-id="HyZn4YAz89M"></iframe></div><p></p><p> </p><p>... and real-time path traced gameplay:<br /><br /></p><div class="separator" style="clear: both; text-align: center;"><iframe allowfullscreen="" class="BLOG_video_class" height="266" src="https://www.youtube.com/embed/pudJRaMizb4" width="320" youtube-src-id="pudJRaMizb4"></iframe></div><p></p><p> </p><p>Siggraph 2020 in-depth overview of Omniverse: </p><p><a href="https://developer.nvidia.com/siggraph/2020/video/sigg05">https://developer.nvidia.com/siggraph/2020/video/sigg05</a><br /></p><p>Be sure to watch this one, because I have a feeling it will knock a few people off their rocker when it's released.</p><p><b><span style="color: #ffa400;">UPDATE 6 Oct 2020</span></b>: </p><div class="separator" style="clear: both; text-align: center;"></div><div class="separator" style="clear: both; text-align: center;"></div><div class="separator" style="clear: both; text-align: center;"></div><div class="separator" style="clear: both; text-align: center;"></div><div class="separator" style="clear: both; text-align: center;"><iframe allowfullscreen="" class="BLOG_video_class" height="266" src="https://www.youtube.com/embed/o_XeGyg2NIo" width="320" youtube-src-id="o_XeGyg2NIo"></iframe></div><p></p><p><br /><b><span style="color: #ffa400;">UPDATE Dec 2020</span></b>: </p><p> Nvidia just released the <a href="https://www.nvidia.com/en-us/design-visualization/omniverse/?nvid=nv-int-cwmfg-36266">open beta of Omniverse</a>!<br /></p><div class="separator" style="clear: both; text-align: center;"><iframe allowfullscreen="" class="BLOG_video_class" height="266" src="https://www.youtube.com/embed/yf1guvkMznc" width="320" youtube-src-id="yf1guvkMznc"></iframe></div><div class="separator" style="clear: both; text-align: center;"><br /></div><div class="separator" style="clear: both; text-align: center;"></div><div class="separator" style="clear: both; text-align: center;"></div><div class="separator" style="clear: both; text-align: center;"></div><div class="separator" style="clear: both; text-align: center;"></div><div class="separator" style="clear: both; text-align: center;"></div><div class="separator" style="clear: both; text-align: center;"><iframe allowfullscreen="" class="BLOG_video_class" height="266" src="https://www.youtube.com/embed/vTGI7NXe9g0" width="320" youtube-src-id="vTGI7NXe9g0"></iframe></div><div class="separator" style="clear: both; text-align: center;"><br /><br /></div><div class="separator" style="clear: both; text-align: center;"><iframe allowfullscreen="" class="BLOG_video_class" height="266" src="https://www.youtube.com/embed/Kdj_ry1mvmM" width="320" youtube-src-id="Kdj_ry1mvmM"></iframe></div><br /><p></p>Sam Laperehttp://www.blogger.com/profile/05688552048697970050noreply@blogger.com0tag:blogger.com,1999:blog-7277449027963623452.post-59016730384714768972020-05-14T18:11:00.001-07:002020-05-18T06:48:14.704-07:00Finally...Today Nvidia showed this astounding demo. Pure real-time ray tracing (with some deep learning based upscaling and denoising), no rasterization or baked lighting. So it finally happened...<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<iframe allowfullscreen="" class="YOUTUBE-iframe-video" data-thumbnail-src="https://i.ytimg.com/vi/H0_NZDSqR3Y/0.jpg" frameborder="0" height="266" src="https://www.youtube.com/embed/H0_NZDSqR3Y?feature=player_embedded" width="320"></iframe> </div>
<div class="separator" style="clear: both; text-align: center;">
<span style="font-size: x-small;">Check out the labels on the paint cans and books</span></div>
<br />
<br />
The workshop setting in the Marbles demo reminds me of an early demo of Arnold Render from the year 2000, which truly stunned me back then, as it was the first time I saw a CG animation which looked completely photorealistic. If it wasn't for the ending, I would have thought it was clever stop motion anitmation: <br />
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
<iframe allowfullscreen="" class="YOUTUBE-iframe-video" data-thumbnail-src="https://i.ytimg.com/vi/_nQaK4AWbiA/0.jpg" frameborder="0" height="266" src="https://www.youtube.com/embed/_nQaK4AWbiA?feature=player_embedded" width="320"></iframe></div>
<br />
The above video was also the reason I learned about unbiased rendering, path tracing and ultimately started dabbling in real-time path tracing, trying to recreate a simplified version of the Arnold demo in real-time (experiment from 2011):<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<iframe allowfullscreen="" class="YOUTUBE-iframe-video" data-thumbnail-src="https://i.ytimg.com/vi/8WMVsEdO99c/0.jpg" frameborder="0" height="266" src="https://www.youtube.com/embed/8WMVsEdO99c?feature=player_embedded" width="320"></iframe></div>
<br />
It's amazing to think that we have finally reached a point where the Pepeland demo in Arnold can be rendered with the same fidelity in real-time on a single GPU, merely 20 years after the original.<br />
<br />
I remember Nvidia first showing off real-time ray traced reflections on the GPU at GDC 2008 and GTC 2009 with a demo of a ray traced Bugatti Veyron running on a couple of pre-Fermi GPUs.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
<iframe allowfullscreen="" class="YOUTUBE-iframe-video" data-thumbnail-src="https://i.ytimg.com/vi/BAZQlQ86IB4/0.jpg" frameborder="0" height="266" src="https://www.youtube.com/embed/BAZQlQ86IB4?feature=player_embedded" width="320"></iframe></div>
<br />Sam Laperehttp://www.blogger.com/profile/05688552048697970050noreply@blogger.com2tag:blogger.com,1999:blog-7277449027963623452.post-32370419786616347602019-09-15T06:00:00.004-07:002020-02-10T05:53:48.313-08:00LightHouse 2, the new OptiX based real-time GPU path tracing framework, released as open sourceJust before Siggraph, Jacco Bikker released Lighthouse 2, his new real-time path tracing framework as open source on Github:<br />
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<a href="https://github.com/jbikker/lighthouse2">https://github.com/jbikker/lighthouse2</a></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
If you haven't heard of Jacco Bikker before, he is the original author of the Brigade engine, which pioneered the use of real-time path tracing in games (way before Nvidia got interested) and was released as open source in 2010 (see <a href="https://raytracey.blogspot.com/2010/04/real-time-pathtracing-demo-shows-future.html">https://raytracey.blogspot.com/2010/04/real-time-pathtracing-demo-shows-future.html</a>).<br />
<br />
Brigade was a real trailblazer and showed off a glimpse of what photorealistic games could look like in a not so distant future. Brigade 2, its successor (and also developed by Jacco Bikker) was fully GPU based which pushed performance to another level.<br />
<br />
As I used to work a lot with Brigade and designed many tech demos with the engine for this blog (see for example <a href="https://raytracey.blogspot.com/2013/03/real-time-path-traced-carmageddon.html">https://raytracey.blogspot.com/2013/03/real-time-path-traced-carmageddon.html</a> and <a href="https://raytracey.blogspot.com/2013/10/brigade-3.html">https://raytracey.blogspot.com/2013/10/brigade-3.html</a>), I was quite thrilled to read that Jacco released a new path tracing engine which fully exploits OptiX and the new hardware accelerated RTX ray tracing cores on Nvidia's Turing GPUs. </div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
The Lighthouse engine has a couple of unique features: </div>
<ul style="text-align: justify;">
<li>Lighthouse uses Nvidia's OptiX framework, which provides state-of-the-art methods to build and traverse BVH acceleration structures, including a built-in "top level BVH" which allows for real-time animated scenes with thousands of individual meshes, practically for free. </li>
<li>There are 3 manually optimised OptiX render cores: </li>
<ul>
<li>OptiX 5 (for Maxwell and Pascal GPUs)</li>
<li>OptiX Prime (for Maxwell and Pascal GPUs)</li>
<li>OptiX 7 (with full RTX support for Turing GPUs)</li>
<ul>
<li>OptiX 7 is much more low level than previous OptiX versions, creating more control for the developer, less overhead and a substantial performance boost on Turing GPUs compared to OptiX 5/6 (about 35%) </li>
<li>A Turing GPU running Lighthouse 2 with OptiX 7 (with RTX support) is <u>about 6x faster</u> than a Pascal GPU running OptiX 5 for path tracing (you have to try it to believe it :-) )</li>
</ul>
</ul>
<li>Lighthouse incorporates the new "blue noise" sampling method (<a href="https://eheitzresearch.wordpress.com/762-2/">https://eheitzresearch.wordpress.com/762-2/</a>), which creates cleaner/less noisy looking images at low sample rates</li>
<li>Lighthouse manages a full game scene graph with instances, camera, lights and materials, including the Disney BRDF (the so-called "principled" shader) and their parameters can be edited on-the-fly through a lightweight GUI</li>
</ul>
<div style="text-align: justify;">
More in the Lighthouse 2 wiki: <a href="https://github.com/jbikker/lighthouse2/wiki">https://github.com/jbikker/lighthouse2/wiki</a> </div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
Some screenshots (rendered with Lighthouse's OptiX 7 core on a RTX 2060)</div>
<br />
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj0ccz-qfYls1j1kgzPBS6Nokivk9RROjJXg5Ae0GEUY38bZHW6NsN_s6QMadWqxZWyY2RvzJacCE2WilaCg8pYuMAYahDGtIPpw1QHxvHdBUCWba8L0YErQ4M0EwFKsRsFGxGTuxS4T_I/s1600/lighthouse2_instanced_dragons.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="867" data-original-width="1600" height="216" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj0ccz-qfYls1j1kgzPBS6Nokivk9RROjJXg5Ae0GEUY38bZHW6NsN_s6QMadWqxZWyY2RvzJacCE2WilaCg8pYuMAYahDGtIPpw1QHxvHdBUCWba8L0YErQ4M0EwFKsRsFGxGTuxS4T_I/s400/lighthouse2_instanced_dragons.png" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">1024 real-time ray traced dragons</td><td class="tr-caption" style="text-align: center;"><br /></td></tr>
</tbody></table>
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEisGQqlIKaPKf3vHchMdP4doO8qTFQGfnKwOl1gkfIe79_j44aQChyphenhyphenQjfKWxaq3debglYbdyBmZTeFJiN9YOc0BiYe3ei4uKm_NB7-rGTNBYKIxqcMJ3GSlnCFHw1e3SHzIUOV4evSmpsI/s1600/lighthouse2_legocars.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="867" data-original-width="1600" height="216" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEisGQqlIKaPKf3vHchMdP4doO8qTFQGfnKwOl1gkfIe79_j44aQChyphenhyphenQjfKWxaq3debglYbdyBmZTeFJiN9YOc0BiYe3ei4uKm_NB7-rGTNBYKIxqcMJ3GSlnCFHw1e3SHzIUOV4evSmpsI/s400/lighthouse2_legocars.png" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">2025 lego cars, spinning in real-time</td></tr>
</tbody></table>
<div class="separator" style="clear: both; text-align: center;">
</div>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhT184eXYFm0DaMzFlYK_fWDSv6xvxkBBzfFWwQkLjpKid8FXMZaZW8-JISxzVBVA52tPi00mtn3hTsSYn4dVlgCVM8SxhZyszGmzaKOIYyRAnWpopUQn99hvAmEnE_Lr7gKey1fs2SSJg/s1600/lighthouse_mats.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="867" data-original-width="1600" height="216" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhT184eXYFm0DaMzFlYK_fWDSv6xvxkBBzfFWwQkLjpKid8FXMZaZW8-JISxzVBVA52tPi00mtn3hTsSYn4dVlgCVM8SxhZyszGmzaKOIYyRAnWpopUQn99hvAmEnE_Lr7gKey1fs2SSJg/s400/lighthouse_mats.png" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Lighthouse 2 material test scene</td></tr>
</tbody></table>
<div style="text-align: justify;">
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjxedliwpniSfoGWBWKSNL1gENuRWepfMC3Q9SX5SMzqhKww8AIbBmSg8_awR5AQfDZna3q4usW-6op6r05y0DZdXfdGjgwZFcD8pfHwMyTeRwvMU4zMJgjDCh8f9Tzqes3aJQJ75EWyro/s1600/lighthouse_cobra.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="867" data-original-width="1600" height="216" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjxedliwpniSfoGWBWKSNL1gENuRWepfMC3Q9SX5SMzqhKww8AIbBmSg8_awR5AQfDZna3q4usW-6op6r05y0DZdXfdGjgwZFcD8pfHwMyTeRwvMU4zMJgjDCh8f9Tzqes3aJQJ75EWyro/s400/lighthouse_cobra.png" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">A real-time raytraced Shelby Cobra</td></tr>
</tbody></table>
<div class="separator" style="clear: both; text-align: center;">
</div>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiUdcSmUwK9JQJkKIUz8zCpBSohH0buSi0EsWJG9EW96hdLPeExdaiZrHWSQXHZF3CZCX_C273oN4TmHC6WTZtctadm-8fMxvCvgGSLRLoSy5NwtVulsrdbHPpdAO3H8OoiB994AC742r0/s1600/bunnies3.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="867" data-original-width="1600" height="216" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiUdcSmUwK9JQJkKIUz8zCpBSohH0buSi0EsWJG9EW96hdLPeExdaiZrHWSQXHZF3CZCX_C273oN4TmHC6WTZtctadm-8fMxvCvgGSLRLoSy5NwtVulsrdbHPpdAO3H8OoiB994AC742r0/s400/bunnies3.png" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Just add bunnies</td></tr>
</tbody></table>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
An old video of Sponza rendered with Lighthouse, showing off the real-time denoiser:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<iframe allowfullscreen="" class="YOUTUBE-iframe-video" data-thumbnail-src="https://i.ytimg.com/vi/uEDTtu2ky3o/0.jpg" frameborder="0" height="266" src="https://www.youtube.com/embed/uEDTtu2ky3o?feature=player_embedded" width="320"></iframe></div>
<br />
<br />
Lighthouse is still a work in progress, but due to its relative simplicity it's easy to quickly test a new sampling
algorithm or experiment with a new fast denoiser, ensuring the code
and performance remains on par with the state-of-the-art in rendering research.<br />
<br />
Given the fact that it handles real-time animation, offers state-of-the-art performance and is licensed under Apache 2.0, Lighthouse 2 may soon end up in professional 3D tools like Blender for fast, photorealistic previews of real-time animations. Next-gen game engine developers should also keep an eye on this.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
Stay tuned for more™ !<br />
<br />
<h3>
Useful links</h3>
<ul>
<li>Lighthouse 2 code on Github: <a href="https://github.com/jbikker/lighthouse2">https://github.com/jbikker/lighthouse2</a></li>
<li>Lighthouse 2 wiki: <a href="https://github.com/jbikker/lighthouse2/wiki">https://github.com/jbikker/lighthouse2/wiki</a> (early stages)</li>
<li>Lighthouse 2 forum: <a href="https://ompf2.com/viewforum.php?f=18" rel="nofollow">https://ompf2.com/viewforum.php?f=18</a></li>
</ul>
<h3>
</h3>
<br />
<br />
<br />
P.S. I may release some executable demos for people who can't compile Lighthouse on their machines. </div>
Sam Laperehttp://www.blogger.com/profile/05688552048697970050noreply@blogger.com0tag:blogger.com,1999:blog-7277449027963623452.post-23544326356195733692019-06-22T08:22:00.003-07:002019-06-22T08:49:27.820-07:00LightTracer, the first WebGL path tracer for photorealistic rendering of complex scenes in the browser<div style="text-align: justify;">
A couple of days ago, Denis Bogolepov sent me a link to LightTracer, a browser based path tracer which he and Danila Ulyanov have developed. I'm quite impressed and excited about LightTracer, as it is the first WebGL based path tracer that can render relatively complex scenes (including textures), which is something I've been waiting to see happen for a while (I tried something similar a few years ago, WebGL still had too many limitations back then).</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
Link: <a href="http://lighttracer.org/">http://lighttracer.org/</a></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
What makes LightTracer particularly interesting is that it has the potential to bring photoreal interactive 3D to the web, paving the way for online e-commerce stores offering their clients a fully photorealistic preview of an article (be it jewellery, cars, wristwatches, running shoes or handbags).</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
Up until now, online shops have been trying several ways to offer their clients "photorealistic" previews with the ability to configure the product's materials and colours. These previews were either precomputed 360 degree videos, interactive 3D using WebGL rasterization and even using server-side rendering via cloud based ray tracing streamed to the browser (e.g. Clara.io and Lagoa Render) which requires expensive servers and is tricky to scale.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
LightTracer's WebGL ray tracing offers a number of unique selling points:</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
- <b>ease of use</b>: it's entirely browser based, so nothing needs to be downloaded or installed</div>
<div style="text-align: justify;">
- <b>intuitive</b>: since ray tracing follow the physics of light, lights and materials behave just like in the real world, allowing non-rendering-experts to predictably light their scenes</div>
<div style="text-align: justify;">
- <b>photorealisitic lighting and materials</b>: as Monte Carlo path tracing solves the full rendering equations without taking shortcuts, this results in truly photoreal scenes</div>
<div style="text-align: justify;">
- <b>speed</b>: LightTracer's ray tracing is accelerated by the GPU via WebGL, offering very fast previews. This should get even faster once WebGL will support hardware accelerated ray tracing via Nvidia's RTX technology (and whatever AMD has in the works)</div>
<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhAooz62umGI5P0wll-HObnsPSPVMn6XQlk3xIBo5HReXXUZS7wrlFJtuFtyJBLTITk-e_FAlljst-QMJC73ttg9cicnMYnNtWPUJV0ldhz0mgmTw1VdHrGhu8XfG838VLw0NRigKQVxjI/s1600/lighttracer1.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="867" data-original-width="1600" height="216" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhAooz62umGI5P0wll-HObnsPSPVMn6XQlk3xIBo5HReXXUZS7wrlFJtuFtyJBLTITk-e_FAlljst-QMJC73ttg9cicnMYnNtWPUJV0ldhz0mgmTw1VdHrGhu8XfG838VLw0NRigKQVxjI/s400/lighttracer1.jpg" width="400" /></a></div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEijBI7JCfvRXSV4kXBpG2bKD79H7hVIvhKNQAdF3HPAuFzB1WVbGdNTYJmaxP9hJHq1tXV48DQO2XshPHazu48ZyA8FnMqsP50kzKyAw3e10PTUVscsV6Huj5k9IUVJqxIcHb1coiuI1sA/s1600/lighttracer2.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="1021" data-original-width="1600" height="255" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEijBI7JCfvRXSV4kXBpG2bKD79H7hVIvhKNQAdF3HPAuFzB1WVbGdNTYJmaxP9hJHq1tXV48DQO2XshPHazu48ZyA8FnMqsP50kzKyAw3e10PTUVscsV6Huj5k9IUVJqxIcHb1coiuI1sA/s400/lighttracer2.jpg" width="400" /></a></div>
<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<iframe allowfullscreen="" class="YOUTUBE-iframe-video" data-thumbnail-src="https://i.ytimg.com/vi/DZ01nne5Q50/0.jpg" frameborder="0" height="266" src="https://www.youtube.com/embed/DZ01nne5Q50?feature=player_embedded" width="320"></iframe></div>
<br />
<br />
<div style="text-align: justify;">
LightTracer is still missing a few features, such as an easy-to-use subsurface scattering shader for realistic skin, hair and waxy materials, and there are plenty of optimisations possible (scene loading speed, UI improvements and presets, etc.) but I think this is the start of something big. </div>
Sam Laperehttp://www.blogger.com/profile/05688552048697970050noreply@blogger.com0tag:blogger.com,1999:blog-7277449027963623452.post-79162101103149523722019-04-06T07:19:00.003-07:002020-12-25T13:23:34.118-08:00Unreal Engine now has real-time ray tracing and a path tracer Epic recently <a href="http://www.cgchannel.com/2019/04/epic-games-releases-unreal-engine-4-22/">released the stable version of Unreal Engine 4.22</a> which comes with real-time ray tracing and a fully fledged path tracer for ground truth images.<br />
<br />
<a href="https://www.unrealengine.com/en-US/blog/real-time-ray-tracing-new-on-set-tools-unreal-engine-4-22">https://www.unrealengine.com/en-US/blog/real-time-ray-tracing-new-on-set-tools-unreal-engine-4-22</a><br />
<br />
The path tracer is explained in more detail on this page: <a href="https://docs.unrealengine.com/en-us/Engine/Rendering/RayTracing/PathTracer">https://docs.unrealengine.com/en-us/Engine/Rendering/RayTracing/PathTracer</a><br />
<br />
The following video is an incredible example of an architectural visualisation rendered with Unreal's real-time raytraced reflections and refractions:<br />
<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<iframe allowfullscreen="" class="YOUTUBE-iframe-video" data-thumbnail-src="https://i.ytimg.com/vi/Qhg0KQj_-mM/0.jpg" frameborder="0" height="266" src="https://www.youtube.com/embed/Qhg0KQj_-mM?feature=player_embedded" width="320"></iframe></div>
<br />
<div style="text-align: justify;">
It's fair to say that real-time photorealism on consumer graphics card has finally arrived. In the last few years, fast and performant path tracers have become available for free (e.g. Embree, OptiX, RadeonRays, Cycles) or virtually for free (e.g Arnold, Renderman). Thanks to advances in noise reduction algorithms, their rendering speed has been accelerated from multiple hours to a few seconds per frame. </div><div style="text-align: justify;"> </div><div style="text-align: justify;">The rate at which game engines, with Unreal at the forefront, are taking over the offline-rendering world is staggering. Off-line rendering for architecture will most probably disappear in the near future and be replaced by game engines with real-time ray tracing features. </div>
Sam Laperehttp://www.blogger.com/profile/05688552048697970050noreply@blogger.com0tag:blogger.com,1999:blog-7277449027963623452.post-28660729524029176762019-02-18T12:36:00.000-08:002019-02-20T23:12:59.509-08:00Nvidia releases OptiX 6.0 with support for hardware accelerated ray tracing<div style="text-align: justify;">
Nvidia recently released a new version of Optix, which finally adds support for the much hyped RTX cores on the Turing GPUs (RTX 2080, Quadro RTX 8000 etc), which provide hardware acceleration for ray-BVH and ray-triangle intersections.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
First results are quite promising. One user reports a speedup between 4x and 5x when using the RTX cores (compared to not using them). Another interesting revelation is that the speedup gets larger with higher scene complexity (geometry-wise, not shading-wise): </div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<a href="https://devtalk.nvidia.com/default/topic/1047464/optix/rtx-on-off-benchmark-optix-6/">https://devtalk.nvidia.com/default/topic/1047464/optix/rtx-on-off-benchmark-optix-6/</a> </div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
As a consequence, the Turing cards can render up to 10x faster in some scenes than the previous generation of Geforce cards, i.e. Pascal (GTX 1080), which is in fact two generations old if you take the Volta architecture into account (Volta was already a huge step up from Pascal in terms of rendering speed, so for Nvidia's sake it's better to compare Turing with Pascal).</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
This post will be updated with more Optix benchmark numbers as they become available.</div>
<div style="text-align: justify;">
<br /></div>
<br />Sam Laperehttp://www.blogger.com/profile/05688552048697970050noreply@blogger.com0tag:blogger.com,1999:blog-7277449027963623452.post-71485046201331521222018-08-28T12:56:00.001-07:002018-09-04T13:33:45.562-07:00Chaos Group (V-Ray) announces real-time path tracer Lavina (+ Nvidia's really real-time path traced global illumination)<div style="text-align: justify;">
(See update at the bottom of this post to see something even more mindblowing)<br />
<br />
The Chaos Group blog features quite an interesting article about the speed increase which can be expected by using Nvidia's recently announced RTX cards: </div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<a href="https://www.chaosgroup.com/blog/what-does-the-new-nvidia-rtx-hardware-mean-for-ray-tracing-gpu-rendering-v-ray">https://www.chaosgroup.com/blog/what-does-the-new-nvidia-rtx-hardware-mean-for-ray-tracing-gpu-rendering-v-ray</a><br />
<br />
Excerpt:<br />
<blockquote class="tr_bq">
"Specialized hardware for ray casting has been attempted in the past, but has been largely unsuccessful — partly because the shading and ray casting calculations are usually closely related and having them run on completely different hardware devices is not efficient. Having both processes running inside the same GPU is what makes the RTX architecture interesting. We expect that in the coming years the RTX series of GPUs will have a large impact on rendering and will firmly establish GPU ray tracing as a technique for producing computer generated images both for off-line and real-time rendering."</blockquote>
</div>
<div style="text-align: justify;">
<br />
The article features a new research project, called Lavina, which is essentially doing real-time ray tracing and path tracing (with reflections, refractions and one GI bounce). The video below gets seriously impressive towards the end: </div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<iframe allowfullscreen="" class="YOUTUBE-iframe-video" data-thumbnail-src="https://i.ytimg.com/vi/K7LWzTvfgU0/0.jpg" frameborder="0" height="266" src="https://www.youtube.com/embed/K7LWzTvfgU0?feature=player_embedded" width="320"></iframe></div>
<br />
<div style="text-align: justify;">
Chaos Group have always been a frontrunner in real-time photorealistic ray tracing research on GPUs, even as far back as Siggraph 2009 where they showed off the first version of V-Ray RT GPU rendering on CUDA (see <a href="http://raytracey.blogspot.com/2009/08/race-for-real-time-ray-tracing.html">http://raytracey.blogspot.com/2009/08/race-for-real-time-ray-tracing.html</a> or <a href="https://www.youtube.com/watch?v=DJLCpS107jg">https://www.youtube.com/watch?v=DJLCpS107jg</a>). </div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
I have to admit that I'm both stoked, but also a bit jealous when I see what Chaos Group has achieved with project Lavina, as it is exactly what I hoped Brigade would turn into one day (Brigade was a premature <a href="https://raytracey.blogspot.com/2010/04/real-time-pathtracing-demo-shows-future.html">real-time path tracing engine developed by Jacco Bikker in 2010</a>, which I experimented with and blogged about quite extensively, see e.g. <a href="http://raytracey.blogspot.com/2012/09/real-time-path-tracing-racing-game.html">http://raytracey.blogspot.com/2012/09/real-time-path-tracing-racing-game.html</a> ). </div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
Then again, thanks to noob-friendly ray tracing API's like Nvidia's RTX and Optix, soon everyone's grandmother and their dog will be able to write a real-time path tracer, so all is well in the end.<br />
<br />
<b>UPDATE</b>: this talk by Nvidia researcher Jacopo Pantaleoni (famous from <a href="http://raytracey.blogspot.com/2011/07/voxelpipe-paper-and-preview-of-real.html">VoxelPipe and Weta's Pantaray</a> engine) "Real-time ray tracing for real-time gloabl illumination" totally trounces the Lavina project, both in quality and in terms of dynamic scenes:<br />
<br />
<a href="http://on-demand.gputechconf.com/siggraph/2018/video/sig1813-4-jacopo-pantalenoni-real-time-global-illumination.html">http://on-demand.gputechconf.com/siggraph/2018/video/sig1813-4-jacopo-pantalenoni-real-time-global-illumination.html</a><br />
<br />
This is just mindblowing stuff. Can't wait to get my hands on it. This probably deserves its own blogpost.</div>
Sam Laperehttp://www.blogger.com/profile/05688552048697970050noreply@blogger.com0tag:blogger.com,1999:blog-7277449027963623452.post-66393721056538706672018-07-30T01:35:00.000-07:002018-07-31T12:40:11.450-07:00Nvidia gearing up to unleash real-time ray tracing to the masses<div style="text-align: justify;">
In the last two months, Nvidia roped in several high profile, world class ray tracing experts (with mostly a CPU ray tracing background):</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<b><a href="http://pharr.org/matt/blog/2018/05/27/nvidia-bound.html">Matt Pharr</a></b><br />
<br />
One of the authors of the Physically Based Rendering books (<a href="http://www.pbrt.org/">www.pbrt.org</a>, some say it's the bible for Monte Carlo ray tracing). Before joining Nvidia, he was working at Google with Paul Debevec on Daydream VR, light fields and Seurat (<a href="https://www.blog.google/products/google-ar-vr/experimenting-light-fields/">https://www.blog.google/products/google-ar-vr/experimenting-light-fields/</a>), none of which took off in a big way for some reason.<br />
<br />
Before Google, he worked at Intel on Larrabee, Intel's failed attempt at making a GPGPU for real-time ray tracing and rasterisation which could compete with Nvidia GPUs) and ISPC, a specialised compiler intended to extract maximum parallelism from the new Intel chips with AVX extensions. He described his time at Intel in great detail on his blog: <a href="http://pharr.org/matt/blog/2018/04/30/ispc-all.html">http://pharr.org/matt/blog/2018/04/30/ispc-all.html</a> (sounds like an awful company to work for).<br />
<br />
Intel also bought Neoptica, Matt's startup, which was supposed to research new and interesting rendering techniques for hybrid CPU/GPU chip architectures like the PS3's Cell<br />
<br /></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<b><a href="https://ingowald.blog/2018/07/27/joining-nvidia/">Ingo Wald</a></b><br />
<br />
Pioneering researcher in the field of real-time ray tracing from the Saarbrücken computer graphics group in Germany, who later moved to Intel and the university of Utah to work on a very high performance CPU based ray tracing frameworks such as Embree (used in Corona Render and Cycles) and Ospray.<br />
<br />
His PhD thesis "<a href="http://www.sci.utah.edu/~wald/PhD/wald_phd.pdf">Real-time ray tracing and interactive global illumination</a>" from 2004, describes a real-time GI renderer running on a cluster of commodity PCs and hardware accelerated ray tracing (OpenRT) on a custom fixed function ray tracing chip (SaarCOR).<br />
<br />
Ingo contributed a lot to the development of high quality ray tracing acceleration structures (built with the surface area heuristic).<br />
<br /></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<b><a href="http://erich.realtimerendering.com/">Eric Haines</a></b><br />
<br />
Main author of the famous <a href="http://www.realtimerendering.com/blog/">Real-time Rendering</a> blog, who worked until recently for Autodesk. He also used to maintain the <a href="http://www.realtimerendering.com/resources/RTNews/demos/overview.htm">Real-time Raytracing Realm</a> and <a href="http://www.realtimerendering.com/resources/RTNews/html/">Ray Tracing News</a></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiN2uUsijQBi53VznDMQpNXldWJ1Jb23wJtZkiGeVVtkU4c4o3WmOJJz7pCNVow41quSGByWKxMNYfq260jIlXpQGxQc01MlXHH-7N_a6dV0KxMpdvfPEvuGhfTnk5L7iN2Rb0Um3C7-tQ/s1600/Screen+Shot+2018-07-30+at+10.44.52.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="712" data-original-width="1256" height="226" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiN2uUsijQBi53VznDMQpNXldWJ1Jb23wJtZkiGeVVtkU4c4o3WmOJJz7pCNVow41quSGByWKxMNYfq260jIlXpQGxQc01MlXHH-7N_a6dV0KxMpdvfPEvuGhfTnk5L7iN2Rb0Um3C7-tQ/s400/Screen+Shot+2018-07-30+at+10.44.52.png" width="400" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<br />
What connects these people is that they all have a passion for real-time ray tracing running in their blood, so having them all united under one roof is bound to give fireworks.<br />
<br /></div>
<div>
With these recent hires and initiatives such as <a href="https://developer.nvidia.com/rtx">RTX</a> (Nvidia's ray tracing API), it seems that Nvidia will be pushing real-time ray tracing into the mainstream really soon. I'm really excited to finally see it all come together. I'm pretty sure that ray tracing will very soon be everywhere and its quality and ease-of-use will soon displace rasterisation based technologies (it's also the reason why I started <a href="http://raytracey.blogspot.com/2008/08/ruby-voxels-and-ray-tracing.html">this blog exactly ten years ago</a>).<br />
<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj4oZLlCfOgAsx-3ANyTxeOUFQECOMIhosDyLPZgfVR1_Mc24jBKlzTz-V_0TPNjzZ5vfUZZzoWVZwvacrahdneTdN21plzN0cbxo4LD7MEXlnSbVgY7hgxyFdBdbz8CuSZ0regTkuZ7iA/s1600/Nvidia+raytraced+reflections.gif" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="270" data-original-width="480" height="225" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj4oZLlCfOgAsx-3ANyTxeOUFQECOMIhosDyLPZgfVR1_Mc24jBKlzTz-V_0TPNjzZ5vfUZZzoWVZwvacrahdneTdN21plzN0cbxo4LD7MEXlnSbVgY7hgxyFdBdbz8CuSZ0regTkuZ7iA/s400/Nvidia+raytraced+reflections.gif" width="400" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<br /><iframe width="320" height="266" class="YOUTUBE-iframe-video" data-thumbnail-src="https://i.ytimg.com/vi/tjf-1BxpR9c/0.jpg" src="https://www.youtube.com/embed/tjf-1BxpR9c?feature=player_embedded" frameborder="0" allowfullscreen></iframe></div>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
<div class="separator" style="clear: both; text-align: justify;">
Also check "<a href="https://www.linkedin.com/jobs/view/senior-real-time-ray-tracing-engineer-at-nvidia-763664203/">Nvidia hiring Senior Real-time Ray Tracing Engineer</a>" </div>
<br />
<b>Senior Real Time Ray Tracing Engineer</b><br />
<b>NVIDIA, Santa Clara, CA, US</b><br />
<br />
<b>Job description</b><br />
<br />
<div style="text-align: justify;">
Are you a real-time rendering engineer looking to work on real-time ray tracing to redefine the look of video games and professional graphics applications? Are you a ray tracing expert looking to transform real-time graphics as we lead the convergence with film? Do you feel at home in complex video game codebases built on the latest GPU hardware and GPU software APIs before anybody else gets to try them?</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
At NVIDIA we are developing the most forward-looking real-time rendering technology combining traditional graphics techniques with real-time ray tracing enabled by NVIDIA's RTX technology. We work at all levels of the stack, from the hardware and driver software, to the engine and application level code. This allows us to take on problems that others can only dream of solving at this point</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
We are looking for Real Time Rendering Software Engineers who are passionate about pushing the limits of what is possible with the best GPUs and who share our forward-looking vision of real-time rendering using real-time ray tracing.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
In this position you will work with some of the world leading real-time ray tracing and rendering experts, developer technology engineers and GPU system software engineers. Your work will impact a number of products being worked on at NVIDIA and outside NVIDIA. These include the NVIDIA Drive Constellation autonomous vehicle simulator, NVIDIA Isaac virtual simulator for robotics, and NVIDIA Holodeck collaborative design virtual environment. Outside NVIDIA our work is laying the foundation for future video games and other rendering applications using real-time ray tracing. The first example of this impact is the NVIDIA GameWorks Ray Tracing denoising modules and much of the technology featured in our NVIDIA RTX demos at GDC 2018.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<br /></div>
<b>What You Will Be Doing</b><br />
<ul>
<li>Implementing new rendering techniques in a game engine using real-time ray tracing with NVIDIA RTX technology </li>
<li>Improving the performance and quality of techniques you or others developed </li>
<li>Ensuring that the rendering techniques are robust and work well for the content needs of products using them </li>
</ul>
<br />
<b>What We Need To See</b><br />
<ul>
<li>Strong knowledge of C++ </li>
<li>BS/MS or higher degree in Computer Science or related field with 5+ years of experience </li>
<li>Up to date knowledge of real-time rendering and offline rendering algorithms and research </li>
<li>Experience with ray tracing in real-time or offline </li>
<li>Knowledge of the GPU Graphics Pipeline and GPU architecture </li>
<li>Experience with GPU Graphics and Compute programming APIs such as Direct3D 11, Direct3D 12, DirectX Raytracing, Vulkan, OpenGL, CUDA, OpenCL or OptiX </li>
<li>Experience writing shader code in HLSL or GLSL for these APIS. </li>
<li>Experience debugging, profiling and optimizing rendering code on GPUs </li>
<li>Comfortable with a complex game engine codebase, such as Unreal Engine 4, Lumberyard, CryEngine or Unity </li>
<li>Familiar with the math commonly used in real-time rendering </li>
<li>Familiar with multi-threaded programming techniques </li>
<li>Can do attitude, with the will to dive into existing code and do what it takes to accomplish your job </li>
<li>Ability to work well with others in a team of deeply passionate individuals who respect each other</li>
</ul>
<ul>
</ul>
</div>
Sam Laperehttp://www.blogger.com/profile/05688552048697970050noreply@blogger.com0tag:blogger.com,1999:blog-7277449027963623452.post-44576531866632100632018-07-22T05:46:00.000-07:002018-07-23T13:48:32.346-07:00Accelerating path tracing by using the BVH as multiresolution geometry<div style="text-align: justify;">
Before continuing the tutorial series, let's have a look at a simple but effective way to speed up path tracing. The idea is quite simple: like an octree, a bounding volume hierarchy (BVH) can double as both a ray tracing acceleration structure and a way to represent the scene geometry at multiple levels of detail (multi-resolution geometry representation). Specifically the axis-aligned bounding boxes (AABB) of the BVH nodes at different depths in the tree serve as a more or less crude approximation of the geometry.<br />
<br /></div>
<div style="text-align: justify;">
Low detail geometry enables much faster ray intersections and can be useful when light effects don't require full geometric accuracy, for example in the case of motion blur, glossy (blurry) reflections, soft shadows, ambient occlusion and global illumination with diffuse bounced lighting. Especially when geometry is not directly visible in the view frustum or in specular (mirror-like) reflections, using geometry proxies can provide a significant speedup (depending on the fault tolerance) at an almost imperceptible and negligible loss in quality.<br />
<br />
Advantages of using the BVH itself as multi-resolution LOD geometry representation:<br />
<ul>
<li>doesn't require an additional scene voxelisation step (the BVH itself provides the LOD): less memory hungry</li>
<li>skips expensive triangle intersection when possible</li>
<li>performs only ray/box intersections (as opposed to having a mix of ray/triangle and ray/box intersections) which is more efficient on the GPU (avoids thread divergence) </li>
<li>BVH is stored in the GPU's cached texture memory (which is faster than global memory which should therefore store the triangles)</li>
<li>BVH nodes can store extra attributes like smoothed normals, interpolated colours and on-the-fly generated GI</li>
</ul>
<div>
(Note: AFAIK low level access to the acceleration structure is not provided by API's like OptiX/RTX and DXR, this has to be written in CUDA, ISPC or OpenCL)</div>
</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
The renderer determines the appropriate level of detail based on the distance from the camera for primary rays or on the distance from the ray origin and the ray type for secondary rays (glossy/reflection, shadow, AO or GI rays). The following screenshots show the bounding boxes of the BVH nodes from depth 1 (depth 0 is the rootnode) up to depth 12:</div>
<div style="text-align: justify;">
<br /></div>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi4L1RVTfiRT6qz58MVM7fht8L8pf-XIyGRy9ElJYPlec3m4ZaI6L3yPS65VBrs0EO5B3n3bf7YzWMwf0Q6pIRtiJCqT_LrLlflGFHWEiPnCj9MgVJVWRbJDbB33GR3mIDJmQhQe7QuH-Y/s1600/bunnyBVHnodes_level1.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="233" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi4L1RVTfiRT6qz58MVM7fht8L8pf-XIyGRy9ElJYPlec3m4ZaI6L3yPS65VBrs0EO5B3n3bf7YzWMwf0Q6pIRtiJCqT_LrLlflGFHWEiPnCj9MgVJVWRbJDbB33GR3mIDJmQhQe7QuH-Y/s400/bunnyBVHnodes_level1.png" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">BVH level 1 (BVH level 0 is just the bunny's bounding box)</td></tr>
</tbody></table>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhCvuN-FOR4nArC48Unyc97Hfc-P3pM-89SAAfeCmEKCG6lWsXNTqFQ5ZO2UR8Bo7Lagq9KdZAVhhBRQtHKfyjnF42XfBbtuI76ciWRyymYa3EzHdsrle6x1nx-t39-eqnN40WigU3pSns/s1600/bunnyBVHnodes_level2.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="233" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhCvuN-FOR4nArC48Unyc97Hfc-P3pM-89SAAfeCmEKCG6lWsXNTqFQ5ZO2UR8Bo7Lagq9KdZAVhhBRQtHKfyjnF42XfBbtuI76ciWRyymYa3EzHdsrle6x1nx-t39-eqnN40WigU3pSns/s400/bunnyBVHnodes_level2.png" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">BVH level 2</td></tr>
</tbody></table>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhyUEE2x0SXPy-S4TWl9yirYjGWog12IM-2yQvcjoPH1rJcz-PmhZCvoChTSJVqUKGXbZWbfKoMR3M66oQ0g3S0YsRKzR-mvFX3eP3QD6Ss9PCX4cqEzBXCg1S5bSrzNshbTyk7Ob2kp3w/s1600/bunnyBVHnodes_level3.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="233" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhyUEE2x0SXPy-S4TWl9yirYjGWog12IM-2yQvcjoPH1rJcz-PmhZCvoChTSJVqUKGXbZWbfKoMR3M66oQ0g3S0YsRKzR-mvFX3eP3QD6Ss9PCX4cqEzBXCg1S5bSrzNshbTyk7Ob2kp3w/s400/bunnyBVHnodes_level3.png" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">BVH level 3</td></tr>
</tbody></table>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjbHLQhV0nGgYJSuOP8DwaWe1rUWr6Pkd_TWRcJBJHkK3MiXOZThmL97hpqil1_nOoI9JUTMxjFGnjAJayhnhXhC-93df3QvfRRN3HhTzgjkwnU8EA9KOxzNnwQyNI5aYqrjQ_oqSiGihA/s1600/bunnyBVHnodes_level4.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="233" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjbHLQhV0nGgYJSuOP8DwaWe1rUWr6Pkd_TWRcJBJHkK3MiXOZThmL97hpqil1_nOoI9JUTMxjFGnjAJayhnhXhC-93df3QvfRRN3HhTzgjkwnU8EA9KOxzNnwQyNI5aYqrjQ_oqSiGihA/s400/bunnyBVHnodes_level4.png" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">BVH level 4</td></tr>
</tbody></table>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgD_0QokfedZoYuXH7c-yNQMa_bf2p0Rv8wH9aHg-_FCfNcTx8oFb2F-RUVbP1yYbVbjqfCfFVSfrZZJOPmVhg7qUcJvdIi1xtkkt6AD8-t8l8PI3SIZatGW3gcH49N5ZpvFyvFDvYUa9Y/s1600/bunnyBVHnodes_level5.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="233" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgD_0QokfedZoYuXH7c-yNQMa_bf2p0Rv8wH9aHg-_FCfNcTx8oFb2F-RUVbP1yYbVbjqfCfFVSfrZZJOPmVhg7qUcJvdIi1xtkkt6AD8-t8l8PI3SIZatGW3gcH49N5ZpvFyvFDvYUa9Y/s400/bunnyBVHnodes_level5.png" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">BVH level 5</td></tr>
</tbody></table>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg95rR8ExzD28ibCyqswycbHeYegMWqFUgHZYxDhv6ruJUYTsK8-ZDkfEOgHfGjUWJvnyEOjIMR7VMpbt6_2_jpc53-dqwBJY1saRP1ljQCFiPl9W5Whn3ibS65WoWxDqdOAgvGV7JRwZ0/s1600/bunnyBVHnodes_level6.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="233" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg95rR8ExzD28ibCyqswycbHeYegMWqFUgHZYxDhv6ruJUYTsK8-ZDkfEOgHfGjUWJvnyEOjIMR7VMpbt6_2_jpc53-dqwBJY1saRP1ljQCFiPl9W5Whn3ibS65WoWxDqdOAgvGV7JRwZ0/s400/bunnyBVHnodes_level6.png" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">BVH level 6</td></tr>
</tbody></table>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhgu08B3EzH_ox2rs1PoV0fhlZkIecW2G_st2hG9NmiX_NxNQIgdJxp3hfiiya7eTASsR1mIvLkMYlXtG6UZV1h7DwGca9NXZdkp8WOeJAVOx04u9aaFtgDnXAcQQlIWgLVWOszvDt8itw/s1600/bunnyBVHnodes_level7.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="233" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhgu08B3EzH_ox2rs1PoV0fhlZkIecW2G_st2hG9NmiX_NxNQIgdJxp3hfiiya7eTASsR1mIvLkMYlXtG6UZV1h7DwGca9NXZdkp8WOeJAVOx04u9aaFtgDnXAcQQlIWgLVWOszvDt8itw/s400/bunnyBVHnodes_level7.png" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">BVH level 7</td></tr>
</tbody></table>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiv33J3kYw0i0j7HmXyWNfmt_VsVvSnTbfSUC2jvMBwPmuOkBMpfxLr8LO0nHqh7yiA0QvQkQ7E454z6STKBv_gTbESmIkF4xUhqV64VXxOT2WT_HAjh5TGXhVeq17NkrOD4Cqt2dnqHaw/s1600/bunnyBVHnodes_level8.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="233" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiv33J3kYw0i0j7HmXyWNfmt_VsVvSnTbfSUC2jvMBwPmuOkBMpfxLr8LO0nHqh7yiA0QvQkQ7E454z6STKBv_gTbESmIkF4xUhqV64VXxOT2WT_HAjh5TGXhVeq17NkrOD4Cqt2dnqHaw/s400/bunnyBVHnodes_level8.png" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">BVH level 8</td></tr>
</tbody></table>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg2u68selEHUNSJlawM1OaGWl1bBVpdtDJqlxyl6BDuAsN2Erm2T9H3WkoxTsHivupQWN4HC-JpRcJRm0K4OLbo_FqymbjNYFmryEnH2h9eWm0JgTjRcRGAr_-CxiqUf2uJ0lNtrS5rC6I/s1600/bunnyBVHnodes_level9.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="233" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg2u68selEHUNSJlawM1OaGWl1bBVpdtDJqlxyl6BDuAsN2Erm2T9H3WkoxTsHivupQWN4HC-JpRcJRm0K4OLbo_FqymbjNYFmryEnH2h9eWm0JgTjRcRGAr_-CxiqUf2uJ0lNtrS5rC6I/s400/bunnyBVHnodes_level9.png" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">BVH level 9</td></tr>
</tbody></table>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEisFTAh_857_GK-qjMbKHMVhG62jUmNUL4X6UjnFPcjVRXoCL4YdJxsWPpM0CjdCOBj3L858Sdzbff1OHgqLd2KMXGQLqPsFav700c2hS4ng7G5sf53X-sLVT97XZ0jJwCdH1wf-qtMcCQ/s1600/bunnyBVHnodes_level10.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="233" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEisFTAh_857_GK-qjMbKHMVhG62jUmNUL4X6UjnFPcjVRXoCL4YdJxsWPpM0CjdCOBj3L858Sdzbff1OHgqLd2KMXGQLqPsFav700c2hS4ng7G5sf53X-sLVT97XZ0jJwCdH1wf-qtMcCQ/s400/bunnyBVHnodes_level10.png" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">BVH level 10</td></tr>
</tbody></table>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgDpufciAn_Ar0WjHTnI9XFjufnKgba2Cc9TpQmFesi6jzPr-9aTMPOqD8OELYReci6omrppBdoDVvxkxN2FR2ELa8nLgWai9hk_wQjRZjBKSLTKPA3fFby1ZJ0F820WIhw6lhK7NbvlVc/s1600/bunnyBVHnodes_level11.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="233" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgDpufciAn_Ar0WjHTnI9XFjufnKgba2Cc9TpQmFesi6jzPr-9aTMPOqD8OELYReci6omrppBdoDVvxkxN2FR2ELa8nLgWai9hk_wQjRZjBKSLTKPA3fFby1ZJ0F820WIhw6lhK7NbvlVc/s400/bunnyBVHnodes_level11.png" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">BVH level 11</td></tr>
</tbody></table>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEggR3CbenZ-r9VDCqqZzKL1jISFxk0hNa2ZoEePD-FHw6F2HafDd6JhuECorJFWSOku4RD2mJPzC_lLm1bRFm_XPVY5LH8W05bTSR2pq6McQxRMd8LDA0MTHxkM36iSCmA6krQ1a9SxQVw/s1600/bunnyBVHnodes_level12.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="233" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEggR3CbenZ-r9VDCqqZzKL1jISFxk0hNa2ZoEePD-FHw6F2HafDd6JhuECorJFWSOku4RD2mJPzC_lLm1bRFm_XPVY5LH8W05bTSR2pq6McQxRMd8LDA0MTHxkM36iSCmA6krQ1a9SxQVw/s400/bunnyBVHnodes_level12.png" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">BVH level 12 (this level contains mostly inner BVH nodes, but also a few leafnodes)</td></tr>
</tbody></table>
<span style="text-align: justify;">The screenshot below shows the bottom-most BVH level (i.e. leafnodes only, hence some holes are apparent):</span><br />
<div style="text-align: justify;">
<br /></div>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhI5AP9KdmUhS_n9sw9pIMKzZ-k2LhQ0WKl9dK_E9eoRPWIu0zYI9kV6x49T_9pMNqV_Hkwjo46V_1IrVLph0xzZ4ap_QM428JRMPnsHco5uewrzugPsZHSaJpMlDakInK8hheleXrK1O4/s1600/bunnyBVHnodes_levelmax.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="233" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhI5AP9KdmUhS_n9sw9pIMKzZ-k2LhQ0WKl9dK_E9eoRPWIu0zYI9kV6x49T_9pMNqV_Hkwjo46V_1IrVLph0xzZ4ap_QM428JRMPnsHco5uewrzugPsZHSaJpMlDakInK8hheleXrK1O4/s400/bunnyBVHnodes_levelmax.png" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Visualizing the BVH leafnodes (bottom most BVH level) </td></tr>
</tbody></table>
<div style="text-align: justify;">
Normals are axis aligned, but can be precomputed per AABB vertex (and stored at low precision) by averaging the normals of the AABBs it contains, with the leafnodes averaging the normals of their triangles.<br />
<br />
TODO upload code to github or alternaive non ms repo and post link, propose fixes to fill holes, present benchmark results (8x speedup), get more timtams </div>
<div style="text-align: justify;">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
</div>
Sam Laperehttp://www.blogger.com/profile/05688552048697970050noreply@blogger.com2tag:blogger.com,1999:blog-7277449027963623452.post-67577640871354030002018-06-01T15:15:00.002-07:002018-06-09T12:13:23.659-07:00Real-time path tracing on a 40 megapixel screen<div style="text-align: justify;">
The <a href="https://bluebrain.epfl.ch/">Blue Brain Project</a> is a Switzerland based computational neuroscience project which aims to demystify how the brain works by simulating a biologically accurate brain using a state-of-the-art supercomputer. The simulation runs at multiple scales and goes from the whole brain level down to the tiny molecules which transport signals from one cell to another (neurotransmitters). The knowledge gathered from such an ultra-detailed simulation can be applied to advance neuroengineering and medical fields.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
To visualize these detailed brain simulations, we have been working on a high performance rendering engine, aptly named "Brayns". <a href="https://github.com/BlueBrain/Brayns">Brayns</a> uses raytracing to render massively complex scenes comprised of trillions of molecules interacting in real-time on a supercomputer. The core ray tracing intersection kernels in Brayns are based on Intel's <a href="https://embree.github.io/">Embree</a> and <a href="https://www.ospray.org/">Ospray</a> high performance ray tracing libraries, which are optimised to render on recent Intel CPUs (such as the Skylake architecture). These CPUs basically are a GPU in CPU disguise (as they are based on Intel's defunct <a href="https://en.wikipedia.org/wiki/Larrabee_(microarchitecture)">Larrabee GPU </a>project), but can render massive scientific scenes in real-time as they can address over a terabyte of RAM. What makes these CPUs ultrafast at ray tracing is a neat feature called <a href="https://en.wikipedia.org/wiki/AVX-512">AVX-512 extensions</a>, which can run several ray tracing calculations in parallel (in combination with <a href="http://pharr.org/matt/blog/2018/04/18/ispc-origins.html">ispc</a>), resulting in blazingly fast CPU ray tracing performance which rivals that of a GPU and even beats it when the scene becomes very complex. </div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
Besides using Intel's superfast ray tracing kernels, Brayns has lots of custom code optimisations which allows it to render a fully path traced scene in real-time. These are some of the features of Brayns:<br />
<ul>
<li>hand optimised BVH traversal and geometry intersection kernels</li>
<li>real-time path traced diffuse global illumination</li>
<li>Optix real-time AI accelerated denoising</li>
<li>HDR environment map lighting</li>
<li>explicit direct lighting (next event estimation)</li>
<li>quasi-Monte Carlo sampling</li>
<li>volume rendering</li>
<li>procedural geometry</li>
<li>signed distance fields raymarching </li>
<li>instancing, allowing to visualize billions of dynamic molecules in real-time</li>
<li>stereoscopic omnidirectional 3D rendering</li>
<li>efficient loading and rendering of multi-terabyte datasets</li>
<li>linear scaling across many nodes</li>
<li>optimised for real-time distributed rendering on a cluster with high speed network interconnection</li>
<li>ultra-low latency streaming to high resolution display walls and VR caves</li>
<li>modular architecture which makes it ideal for experimenting with new rendering techniques</li>
<li>optional noise and gluten free rendering</li>
</ul>
Below is a screenshot of an early real-time path tracing test on a 40 megapixel curved screen powered by seven 4K projectors: </div>
<div style="text-align: justify;">
<br /></div>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhwR2uAbtxLC9z0QxmAYds08QOFsHyW7YCj8Jte-eVNstRo_2ajh_OomRi0AJCXBFoaAGadCi2iuw3VzlfmrApK0b8XpnCqOcWtOTpXEwmWnE_HD_5Bm2k3ZSyIjAyrQoBz6duym62S5wo/s1600/path_tracing_test_BBP_cave.jpg" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="900" data-original-width="1600" height="225" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhwR2uAbtxLC9z0QxmAYds08QOFsHyW7YCj8Jte-eVNstRo_2ajh_OomRi0AJCXBFoaAGadCi2iuw3VzlfmrApK0b8XpnCqOcWtOTpXEwmWnE_HD_5Bm2k3ZSyIjAyrQoBz6duym62S5wo/s400/path_tracing_test_BBP_cave.jpg" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Real-time path traced scene on a 8 m by 3 m (25 by 10 ft) semi-cylindrical display,<br />
powered by seven 4K projectors (40 megapixels in total)</td></tr>
</tbody></table>
<div style="text-align: justify;">
<br /></div>
<span style="text-align: justify;">Seeing this scene projected lifesize in photorealistic detail on a 180 degree stereoscopic 3D screen and interacting with it in real-time is quite a breathtaking experience. Having 3D molecules zooming past the observer will be the next milestone. I haven't felt this thrilled about path tracing in quite some time.</span><br />
<span style="text-align: justify;"><br /></span>
<span style="text-align: justify;"><br /></span>
<br />
<h3>
<span style="text-align: justify;"><b>Technical/Medical/Scientific 3D artists wanted </b></span></h3>
<div>
<span style="text-align: justify;"><b><br /></b></span></div>
<div style="text-align: justify;">
We are currently looking for technical 3D artists to join our team to produce immersive neuroscientific 3D content. If this sounds interesting to you, get in touch by emailing me at sam.lapere@live.be</div>
Sam Laperehttp://www.blogger.com/profile/05688552048697970050noreply@blogger.com3tag:blogger.com,1999:blog-7277449027963623452.post-59449822614649244612017-12-22T03:30:00.000-08:002018-06-01T23:47:26.018-07:00Freedom of noise: Nvidia releases OptiX 5.0 with real-time AI denoiser <div style="text-align: justify;">
2018 will be bookmarked as a turning point for Monte Carlo rendering due to the wide availability of fast, high quality denoising algorithms, which can be attributed for a large part to Nvidia Research: Nvidia just released OptiX 5.0 to developers, which contains a new GPU accelerated "AI denoiser" which works as post-processing filter.<br />
<br />
<div style="text-align: start;">
<a href="https://blogs.nvidia.com/blog/2017/12/07/nvidia-optix-ai-denoiser/">https://blogs.nvidia.com/blog/2017/12/07/nvidia-optix-ai-denoiser/</a></div>
<div style="text-align: start;">
<br /></div>
<div style="text-align: start;">
<a href="https://developer.nvidia.com/optix-denoiser">https://developer.nvidia.com/optix-denoiser</a></div>
</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
In contrast to traditional denoising filters, this new denoiser was trained using machine learning on a database of thousands of rendered image pairs (using both the noisy and noise-free renders of the same scene) providing the denoiser with a "memory": instead of calculating the reconstructed image from scratch (as a regular noise filter would do), it "remembers" the solution from having encountered similar looking noisy input scenes during the machine learning phase and makes a best guess, which is often very close to the converged image but incorrect (although the guesses progressively get better as the image refines and more data is available). By looking up the solution in its memory, the AI denoiser thus bypasses most of the costly calculations needed for reconstructing the image and works pretty much in real-time as a result.<br />
<br />
The OptiX 5.0 SDK contains a sample program of a simple path tracer with the denoiser running on top (as a post-process). The results are nothing short of stunning: noise disappears completely, even difficult indirectly lit surfaces like refractive (glass) objects and shadowy areas clear up remarkably fast and the image progressively get closer to the ground truth. </div>
<div style="text-align: justify;">
<br /></div>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhlUZznke1PEH77_MRGHRg1QgvStNDZtthwuEEuEEmEisgMGDIxQSSaWV-ZMDdqTy64MCpCU91ar56EfpPItEP6AHNkT3tPWhB9pxA-ZxU6eW1MukpVgColxZTKR9X9CSLj-JtP2hzNJSs/s1600/glass_denoise.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="768" data-original-width="1366" height="223" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhlUZznke1PEH77_MRGHRg1QgvStNDZtthwuEEuEEmEisgMGDIxQSSaWV-ZMDdqTy64MCpCU91ar56EfpPItEP6AHNkT3tPWhB9pxA-ZxU6eW1MukpVgColxZTKR9X9CSLj-JtP2hzNJSs/s400/glass_denoise.png" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">The OptiX denoiser works great for glass and dark, indirectly lit areas</td></tr>
</tbody></table>
<div style="text-align: justify;">
<br />
The denoiser is based on the Nvidia research paper "<a href="http://research.nvidia.com/publication/interactive-reconstruction-monte-carlo-image-sequences-using-recurrent-denoising">Interactive Reconstruction of Monte Carlo Image Sequences using a Recurrent Denoising Autoencoder</a>". The relentless Karoly Zsolnai from <a href="https://www.youtube.com/channel/UCbfYPyITQ-7l4upoX8nvctg">Two-minute papers</a> made an excellent video about this paper:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<iframe allowfullscreen="" class="YOUTUBE-iframe-video" data-thumbnail-src="https://i.ytimg.com/vi/YjjTPV2pXY0/0.jpg" frameborder="0" height="266" src="https://www.youtube.com/embed/YjjTPV2pXY0?feature=player_embedded" width="320"></iframe></div>
<br />
<br />
While in general the denoiser does a fantastic job, it's not yet optimised to deal with areas that converge fast, and in some instances overblurs and fails to preserve texture detail as shown in the screen grab below. The blurring of texture detail improves over time with more iterations, but perhaps this initial overblurring can be solved with more training samples for the denoiser:</div>
<div style="text-align: justify;">
<span style="text-align: start;"><br /></span></div>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjKHqothIMalEnKfYyp_EITNu54SW0xWmybvJ617tQdL1A0-L_iul2-ke1A4gPrdZ-Js3fH_X6U-LtBKf01S3ZRDPLtxhev68mX1VdLr3EwPa5T1_iCvafOqA6us-pBMHnVx4T9emr68q0/s1600/overblurring.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="768" data-original-width="1366" height="223" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjKHqothIMalEnKfYyp_EITNu54SW0xWmybvJ617tQdL1A0-L_iul2-ke1A4gPrdZ-Js3fH_X6U-LtBKf01S3ZRDPLtxhev68mX1VdLr3EwPa5T1_iCvafOqA6us-pBMHnVx4T9emr68q0/s400/overblurring.png" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Overblurring of textures</td></tr>
</tbody></table>
<div style="text-align: justify;">
The denoiser is provided free for commercial use (royalty-free), but requires an Nvidia GPU. It works with both CPU and GPU rendering engines and is already implemented in Iray (Nvidia's own GPU renderer), V-Ray (by Chaos Group), Redshift Render and Clarisse (a CPU based renderer for VFX by Isotropix).</div>
<br />
Some videos of the denoiser in action in Optix, V-Ray, Redshift and Clarisse:<br />
<br />
Optix 5.0: <a href="https://www.youtube.com/watch?v=l-5NVNgT70U">youtu.be/l-5NVNgT70U</a><br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<iframe allowfullscreen="" class="YOUTUBE-iframe-video" data-thumbnail-src="https://i.ytimg.com/vi/l-5NVNgT70U/0.jpg" frameborder="0" height="266" src="https://www.youtube.com/embed/l-5NVNgT70U?feature=player_embedded" width="320"></iframe></div>
<br />
<br />
Iray: <a href="https://www.youtube.com/watch?v=yPJaWvxnYrg">youtu.be/yPJaWvxnYrg</a><br />
<br />
<div style="text-align: justify;">
This video shows the denoiser in action in Iray and provides a high level explanation of the deep learning algorithm behind the OptiX/Iray denoiser:</div>
<div style="text-align: justify;">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<iframe allowfullscreen="" class="YOUTUBE-iframe-video" data-thumbnail-src="https://i.ytimg.com/vi/yPJaWvxnYrg/0.jpg" frameborder="0" height="266" src="https://www.youtube.com/embed/yPJaWvxnYrg?feature=player_embedded" width="320"></iframe></div>
<br />
<br />
V-Ray 4.0: <a href="http://youtu.be/nvA4GQAPiTc">youtu.be/nvA4GQAPiTc</a><br />
<span style="background-color: white; color: #212121; font-family: , "segoe ui" , "segoe wp" , "tahoma" , "arial" , sans-serif , serif , "emojifont"; font-size: 13.3333px;"><br /></span>
<br />
<div class="separator" style="clear: both; text-align: center;">
<iframe allowfullscreen="" class="YOUTUBE-iframe-video" data-thumbnail-src="https://i.ytimg.com/vi/nvA4GQAPiTc/0.jpg" frameborder="0" height="266" src="https://www.youtube.com/embed/nvA4GQAPiTc?feature=player_embedded" width="320"></iframe></div>
<br />
<br />
Redshift: <a href="https://www.youtube.com/watch?v=ofcCQdIZAd8">youtu.be/ofcCQdIZAd8</a> (and <a href="https://blenderartists.org/forum/showthread.php?395313-Experimental-2-77-Cycles-Denoising-build&p=3267247&viewfull=1#post3267247">a post from Redshift's Panos</a> explaining the implementation in Redshift)<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<iframe allowfullscreen="" class="YOUTUBE-iframe-video" data-thumbnail-src="https://i.ytimg.com/vi/ofcCQdIZAd8/0.jpg" frameborder="0" height="266" src="https://www.youtube.com/embed/ofcCQdIZAd8?feature=player_embedded" width="320"></iframe></div>
<br />
ClarisseFX: <a href="https://www.youtube.com/watch?v=elWx5d7c_DI">youtu.be/elWx5d7c_DI</a><br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<iframe allowfullscreen="" class="YOUTUBE-iframe-video" data-thumbnail-src="https://i.ytimg.com/vi/elWx5d7c_DI/0.jpg" frameborder="0" height="266" src="https://www.youtube.com/embed/elWx5d7c_DI?feature=player_embedded" width="320"></iframe></div>
<br />
<br />
<div style="text-align: justify;">
Other renderers like Cycles and Corona already have their own built-in denoisers, but will probably benefit from the OptiX denoiser as well (especially Corona which was acquired by Chaos Group in September 2017).</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
The OptiX team has indicated that they are researching an optimised version of this filter for use in interactive to real-time photorealistic rendering, which might find its way into game engines. Real-time noise-free photorealistic rendering is tantalisingly close.</div>
Sam Laperehttp://www.blogger.com/profile/05688552048697970050noreply@blogger.com2tag:blogger.com,1999:blog-7277449027963623452.post-16965641740059059542017-07-09T00:47:00.001-07:002017-07-29T22:17:46.181-07:00Towards real-time path tracing: An Efficient Denoising Algorithm for Global Illumination<div style="text-align: justify;">
July is a great month for rendering enthusiasts: there's of course Siggraph, but the most exciting conference is <a href="http://www.highperformancegraphics.org/2017/program/">High Performance Graphics</a>, which focuses on (real-time) ray tracing. One of the more interesting sounding papers is titled: "Towards real-time path tracing: An Efficient Denoising Algorithm for Global Illumination" by Mara, McGuire, Bitterli and Jarosz, which was released a couple of days ago. The paper, video and source code can be found at</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<a href="https://www.cs.dartmouth.edu/~wjarosz/publications/mara17towards.html">https://www.cs.dartmouth.edu/~wjarosz/publications/mara17towards.html</a></div>
<div style="text-align: justify;">
<br /></div>
<blockquote class="tr_bq" style="text-align: justify;">
<b>Abstract</b> </blockquote>
<blockquote class="tr_bq" style="text-align: justify;">
We propose a hybrid ray-tracing/rasterization strategy for realtime
rendering enabled by a fast new denoising method. We factor
global illumination into direct light at rasterized primary surfaces
and two indirect lighting terms, each estimated with one pathtraced
sample per pixel. Our factorization enables efficient (biased)
reconstruction by denoising light without blurring materials. We
demonstrate denoising in under 10 ms per 1280×720 frame, compare
results against the leading offline denoising methods, and include a
supplement with source code, video, and data.</blockquote>
<br />
<div style="text-align: justify;">
While the premise of the paper sounds incredibly exciting, the results are disappointing. The denoising filter does a great job filtering almost all the noise (apart from some noise which is still visible in reflections), but at the same it kills pretty much all the realism that path tracing is famous for, producing flat and lifeless images. Even the first Crysis from 10 years ago (the first game with SSAO) looks distinctly better. I don't think applying such aggressive filtering algorithms to a path tracer will convince game developers to make the switch to path traced rendering anytime soon. A comparison with ground truth reference images (rendered to 5000 samples or more) is also lacking from some reason. </div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
At the same conference, a <a href="http://cwyman.org/papers.html">very similar paper</a> will be presented titled "Spatiotemporal Variance-Guided Filtering: Real-Time Reconstruction for Path-Traced Global Illumination". </div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<a href="http://cwyman.org/images/svgfTeaser.png">Teaser image</a></div>
<blockquote class="tr_bq" style="text-align: justify;">
<b>Abstract</b> </blockquote>
<blockquote class="tr_bq" style="text-align: justify;">
We introduce a reconstruction algorithm that generates a temporally stable sequence of images from one path-per-pixel global illumination. To handle such noisy input, we use temporal accumulation to increase the effective sample count and spatiotemporal luminance variance estimates to drive a hierarchical, image-space wavelet filter. This hierarchy allows us to distinguish between noise and detail at multiple scales using luminance variance. </blockquote>
<blockquote class="tr_bq" style="text-align: justify;">
Physically-based light transport is a longstanding goal for real-time computer graphics. While modern games use limited forms of ray tracing, physically-based Monte Carlo global illumination does not meet their 30 Hz minimal performance requirement. Looking ahead to fully dynamic, real-time path tracing, we expect this to only be feasible using a small number of paths per pixel. As such, image reconstruction using low sample counts is key to bringing path tracing to real-time. When compared to prior interactive reconstruction filters, our work gives approximately 10x more temporally stable results, matched references images 5-47% better (according to SSIM), and runs in just 10 ms (+/- 15%) on modern graphics hardware at 1920x1080 resolution.</blockquote>
<div style="text-align: justify;">
It's going to be interesting to see if the method in this paper produces more convincing results that the other paper. Either way HPG has a bunch more interesting papers which are worth keeping an eye on.<br />
<br />
UPDATE (16 July): Christoph Schied from Nvidia and KIT, emailed me a link to the paper's preprint and video at <a href="http://cg.ivd.kit.edu/svgf.php">http://cg.ivd.kit.edu/svgf.php</a> Thanks Christoph!<br />
<br />
Video screengrab:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjWvpLOLULzU88yI87fPyfWp8OmCiOE0u9BhTpaeoTfW_x46j43uRrOZGgx6D0OsHnhdaQFjflCZ9L2biEUu0IGPBlnmbHxQWOgsG86Wo8us_7UIDUcttyeLcJW_uASWXyjDh40JBO73hA/s1600/svgf_videograb.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="604" data-original-width="1076" height="223" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjWvpLOLULzU88yI87fPyfWp8OmCiOE0u9BhTpaeoTfW_x46j43uRrOZGgx6D0OsHnhdaQFjflCZ9L2biEUu0IGPBlnmbHxQWOgsG86Wo8us_7UIDUcttyeLcJW_uASWXyjDh40JBO73hA/s400/svgf_videograb.png" width="400" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
<div class="separator" style="clear: both; text-align: justify;">
I'm not convinced by the quality of filtered path traced rendering at 1 sample per pixel, but perhaps the improvements in spatiotemporal stability of this noise filter can be quite helpful for filtering animated sequences at higher sample rates.</div>
<div class="separator" style="clear: both; text-align: justify;">
<br /></div>
<div class="separator" style="clear: both; text-align: justify;">
UPDATE (23 July) There is another denoising paper out from Nvidia: "<i>Interactive Reconstruction of Monte Carlo Image Sequences using a Recurrent Denoising Autoencoder</i>" which uses machine learning to reconstruct the image.</div>
<div class="separator" style="clear: both; text-align: justify;">
<br /></div>
<div class="separator" style="clear: both; text-align: justify;">
<a href="http://research.nvidia.com/publication/interactive-reconstruction-monte-carlo-image-sequences-using-recurrent-denoising">http://research.nvidia.com/publication/interactive-reconstruction-monte-carlo-image-sequences-using-recurrent-denoising</a></div>
<div class="separator" style="clear: both; text-align: justify;">
<br /></div>
<blockquote style="clear: both;">
<b>Abstract</b> </blockquote>
<blockquote style="clear: both;">
We describe a machine learning technique for reconstructing image se- quences rendered using Monte Carlo methods. Our primary focus is on reconstruction of global illumination with extremely low sampling budgets at interactive rates. Motivated by recent advances in image restoration with deep convolutional networks, we propose a variant of these networks better suited to the class of noise present in Monte Carlo rendering. We allow for much larger pixel neighborhoods to be taken into account, while also improving execution speed by an order of magnitude. Our primary contri- bution is the addition of recurrent connections to the network in order to drastically improve temporal stability for sequences of sparsely sampled input images. Our method also has the desirable property of automatically modeling relationships based on auxiliary per-pixel input channels, such as depth and normals. We show signi cantly higher quality results compared to existing methods that run at comparable speeds, and furthermore argue a clear path for making our method run at realtime rates in the near future.</blockquote>
</div>
Sam Laperehttp://www.blogger.com/profile/05688552048697970050noreply@blogger.com5tag:blogger.com,1999:blog-7277449027963623452.post-63629249531946920522017-05-21T03:09:00.003-07:002017-05-23T04:10:59.689-07:00Practical light field rendering tutorial with Cycles<div style="text-align: justify;">
This week Google announced "Seurat", a novel surface lightfield rendering technology which would enable "real-time cinema-quality, photorealistic graphics" on mobile VR devices, developed in collaboration with ILMxLab:</div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<iframe allowfullscreen="" class="YOUTUBE-iframe-video" data-thumbnail-src="https://i.ytimg.com/vi/cPQum_be9wk/0.jpg" frameborder="0" height="266" src="https://www.youtube.com/embed/cPQum_be9wk?feature=player_embedded" width="320"></iframe></div>
<br />
<div style="text-align: justify;">
The technology captures all light rays in a scene by pre-rendering it from many different viewpoints. During runtime, entirely new viewpoints are created by interpolating those viewpoints on-the-fly resulting in photoreal reflections and lighting in real-time (http://www.roadtovr.com/googles-seurat-surface-light-field-tech-graphical-breakthrough-mobile-vr/).</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
At almost the same time, Disney released a paper called "<a href="https://www.disneyresearch.com/publication/real-time-rendering-with-compressed-animated-light-fields/">Real-time rendering with compressed animated light fields</a>", demonstrating the feasibility of rendering a Pixar quality 3D movie in real-time where the viewer can actually be part of the scene and walk in between scene elements or characters (according to a predetermined camera path):</div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<iframe allowfullscreen="" class="YOUTUBE-iframe-video" data-thumbnail-src="https://i.ytimg.com/vi/-Hc92mP3GLw/0.jpg" frameborder="0" height="266" src="https://www.youtube.com/embed/-Hc92mP3GLw?feature=player_embedded" width="320"></iframe></div>
<br />
<div style="text-align: justify;">
Light field rendering in itself is not a new technique and has actually been around for more than 20 years, but has only recently become a viable rendering technique. The first paper was released at Siggraph 1996 ("<a href="http://graphics.stanford.edu/papers/light/">Light field rendering</a>" by Mark Levoy and Pat Hanrahan) and the method has since been incrementally improved by others. The Stanford university compiled an entire archive of light fields to accompany the Siggraph paper from 1996 which can be found at <a href="http://graphics.stanford.edu/software/lightpack/lifs.html">http://graphics.stanford.edu/software/lightpack/lifs.html</a>. A more up-to-date archive of photography-based light fields can be found at <a href="http://lightfield.stanford.edu/lfs.html">http://lightfield.stanford.edu/lfs.html</a></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
One of the first movies that showed a practical use for light fields is The Matrix from 1999, where an array of cameras firing at the same time (or in rapid succession) made it possible to pan around an actor to create a super slow motion effect ("bullet time"):</div>
<div style="text-align: justify;">
<br /></div>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEip0YtBfoWBUPIvdGy90FIHUD1UQkQQtczJwYgwayrrm44hljeZwijI4KKn0CyA0HbFelSKgTtBEMmBwD4SE_40ssTSPSgLVY591xMGo7XTj-1Ug93ZxoL7bsi8xfFhuV1LHVLY0fpWjpg/s1600/matri_bullet_time.jpg" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="267" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEip0YtBfoWBUPIvdGy90FIHUD1UQkQQtczJwYgwayrrm44hljeZwijI4KKn0CyA0HbFelSKgTtBEMmBwD4SE_40ssTSPSgLVY591xMGo7XTj-1Ug93ZxoL7bsi8xfFhuV1LHVLY0fpWjpg/s400/matri_bullet_time.jpg" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Bullet time in The Matrix (1999)</td></tr>
</tbody></table>
<br />
<b style="font-size: x-large;">Rendering the light field</b><br />
<br />
<div style="text-align: justify;">
Instead of attempting to explain the theory behind light fields (for which there are plenty of excellent online sources), the main focus of this post is to show how to quickly get started with rendering a synthetic light field using Blender Cycles and some open-source plug-ins. If you're interested in a crash course on light fields, check out Joan Charmant's video tutorial below, which explains the basics of implementing a light field renderer:</div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<iframe allowfullscreen="" class="YOUTUBE-iframe-video" data-thumbnail-src="https://i.ytimg.com/vi/p2w1DNkITI8/0.jpg" frameborder="0" height="266" src="https://www.youtube.com/embed/p2w1DNkITI8?feature=player_embedded" width="320"></iframe></div>
<br />
<div style="text-align: justify;">
The following video demonstrates light fields rendered with Cycles:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<iframe allowfullscreen="" class="YOUTUBE-iframe-video" data-thumbnail-src="https://i.ytimg.com/vi/AYkC8YeO9wU/0.jpg" frameborder="0" height="266" src="https://www.youtube.com/embed/AYkC8YeO9wU?feature=player_embedded" width="320"></iframe></div>
<br />
<br />
Rendering a light field is actually surprisingly easy with Blender's Cycles and doesn't require much technical expertise (besides knowing how to build the plugins). For this tutorial, we'll use a couple of open source plug-ins:</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
1) The first one is the light field camera grid add-on for Blender made by Katrin Honauer and Ole Johanssen from the Heidelberg University in Germany: </div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<a href="https://github.com/lightfield-analysis/blender-addon">https://github.com/lightfield-analysis/blender-addon</a> </div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
This plug-in sets up a camera grid in Blender and renders the scene from each camera using the Cycles path tracing engine. Good results can be obtained with a grid of 17 by 17 cameras with a distance of 10 cm between neighbouring cameras. For high quality, a 33-by-33 camera grid with an inter-camera distance of 5 cm is recommended.</div>
<div style="text-align: justify;">
<br /></div>
<table cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhciQ4JYc4LUf4czb6Bx7Zh9ibq3ivdCctLdyfIAoJ6ps8e3-qSF7-D_xf9wLHTPmmpLYTXkavKfy7oYDLJfE6_9BaaUDWP3MlX63rC0zHHbYe5UQ2Tf_VvSoHhKlVbMayoYcElljN2buA/s1600/lightfield_camera_grid.jpg" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="212" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhciQ4JYc4LUf4czb6Bx7Zh9ibq3ivdCctLdyfIAoJ6ps8e3-qSF7-D_xf9wLHTPmmpLYTXkavKfy7oYDLJfE6_9BaaUDWP3MlX63rC0zHHbYe5UQ2Tf_VvSoHhKlVbMayoYcElljN2buA/s400/lightfield_camera_grid.jpg" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">3-by-3 camera grid with their overlapping frustrums</td></tr>
</tbody></table>
<br />
<div style="text-align: justify;">
2) The second tool is the light field encoder and WebGL based light field viewer, created by Michal Polko, found at <a href="https://github.com/mpk/lightfield">https://github.com/mpk/lightfield</a> (build instructions are included in the readme file).<br />
<br />
This plugin takes in all the images generated by the first plug-in and compresses them by keeping some keyframes and encoding the delta in the remaining intermediary frames. The viewer is WebGL based and makes use of virtual texturing (similar to Carmack's mega-textures) for fast, on-the-fly reconstruction of new viewpoints from pre-rendered viewpoints (via hardware accelerated bilinear interpolation on the GPU).</div>
<br />
<br />
<span style="font-size: large;"><b>Results and Live Demo</b></span><br />
<br />
<span style="text-align: justify;">A live online demo of the light field with the dragon can be seen here: </span><br />
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<a href="https://raytracey.neocities.org/">https://raytracey.neocities.org</a></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
You can change the viewpoint (within the limits of the original camera grid) and refocus the image in real-time by clicking on the image. </div>
<div style="text-align: justify;">
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEinxKlFZfBd7kZrWP7aRsaI78N3_lTGU2k9XLnxKISQvmbdSj-G85xG5VpRwWmA1Ok4fA_uVuXJVKygNyjth5PV_1BMi8VZngkjl9vT1XGJcuUKSUrwAQBeiKnWZ_DeUfM98OuB7MZSMyo/s1600/dragonLF1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="212" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEinxKlFZfBd7kZrWP7aRsaI78N3_lTGU2k9XLnxKISQvmbdSj-G85xG5VpRwWmA1Ok4fA_uVuXJVKygNyjth5PV_1BMi8VZngkjl9vT1XGJcuUKSUrwAQBeiKnWZ_DeUfM98OuB7MZSMyo/s400/dragonLF1.png" width="400" /></a></div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjigD-umeVmx9Fgb5ORpGKFEAVAdSzorTulhKQKUBHndSki7w0tvyxESg-Hn-BeyJEXKRXduUYqGWGFMCT5BtFDssFhUKssOfV8lQEwYpEAQ1xt_xx_p90oJQm4rGzjRo00RujBE2CAt-Q/s1600/dragonLF2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="212" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjigD-umeVmx9Fgb5ORpGKFEAVAdSzorTulhKQKUBHndSki7w0tvyxESg-Hn-BeyJEXKRXduUYqGWGFMCT5BtFDssFhUKssOfV8lQEwYpEAQ1xt_xx_p90oJQm4rGzjRo00RujBE2CAt-Q/s400/dragonLF2.png" width="400" /></a></div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjzm22zifPV-n6mhHCTTS1-G79EEgU6MLQ1Z9Yo6XMHUTOcPZBXYPW4A1waJMAANCBDsGCeBAEx1C9_vodJsNobTya02584Gi9yCykrK-lzDfn-zXce8lilkpwOKlbJvRp-YSzfnC2_ch0/s1600/dragonLF3.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="212" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjzm22zifPV-n6mhHCTTS1-G79EEgU6MLQ1Z9Yo6XMHUTOcPZBXYPW4A1waJMAANCBDsGCeBAEx1C9_vodJsNobTya02584Gi9yCykrK-lzDfn-zXce8lilkpwOKlbJvRp-YSzfnC2_ch0/s400/dragonLF3.png" width="400" /></a></div>
<br /></div>
<div style="text-align: justify;">
I rendered the Stanford dragon using a 17 by 17 camera grid and distance of 5 cm between adjacent cameras. The light field was created by rendering the scene from 289 (17x17) different camera viewpoints, which took about 6 minutes in total (about 1 to 2 seconds rendertime per 512x512 image on a good GPU). The 289 renders are then highly compressed (for this scene, the 107 MB large batch of 289 images was compressed down to only 3 MB!). </div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
A depth map is also created at the same time an enables on-the-fly refocusing of the image, by interpolating information from several images, </div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
A later tutorial will add a bit more freedom to the camera, allowing for rotation and zooming.</div>
Sam Laperehttp://www.blogger.com/profile/05688552048697970050noreply@blogger.com5tag:blogger.com,1999:blog-7277449027963623452.post-88297662045084908332017-03-20T23:39:00.000-07:002017-03-20T23:39:37.777-07:00Virtual reality<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhmX7eFV10I4caKXO1iA-HSclm95JIdiTtQGSWAGPn3g7UBAk80ec5JXGD8id8rayNMhkhZIGGGBuoZwNOAo7TIuhSQTcRlVni9AdZCs7wgP72roxRfyQ_U9gFkyF8zy15SS3tHHcP7p0c/s1600/gearvr.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="640" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhmX7eFV10I4caKXO1iA-HSclm95JIdiTtQGSWAGPn3g7UBAk80ec5JXGD8id8rayNMhkhZIGGGBuoZwNOAo7TIuhSQTcRlVni9AdZCs7wgP72roxRfyQ_U9gFkyF8zy15SS3tHHcP7p0c/s640/gearvr.jpg" width="520" /></a></div>
<br />Sam Laperehttp://www.blogger.com/profile/05688552048697970050noreply@blogger.com3tag:blogger.com,1999:blog-7277449027963623452.post-79161147301065882512017-01-11T17:29:00.000-08:002017-01-11T17:38:50.567-08:00OpenCL path tracing tutorial 3: OpenGL viewport, interactive camera and defocus blurJust a link to the source code on Github for now, I'll update this post with a more detailed description when I find a bit more time:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjETEzSgCP_mf50RjCVoSxxaoHcuZ-GYguF9RaDIrNfjSMNA3bSSdftZ6xhUCKKhoaTKxBeWKpSbqiTC8eqAYZ79UCCg_cqQ7d7x9RcHQZZpa-Qn-nVvoByqN4QL2BLrzHqD0Mmp3Jfii8/s1600/opencl_tut3_3.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="233" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjETEzSgCP_mf50RjCVoSxxaoHcuZ-GYguF9RaDIrNfjSMNA3bSSdftZ6xhUCKKhoaTKxBeWKpSbqiTC8eqAYZ79UCCg_cqQ7d7x9RcHQZZpa-Qn-nVvoByqN4QL2BLrzHqD0Mmp3Jfii8/s400/opencl_tut3_3.png" width="400" /></a></div>
<br />
<br />
<span style="font-size: large;"><b>Part 1</b> <b>Setting up an OpenGL window</b></span> <br />
<br />
<a href="https://github.com/straaljager/OpenCL-path-tracing-tutorial-3-Part-1">https://github.com/straaljager/OpenCL-path-tracing-tutorial-3-Part-1</a><br />
<br />
<br />
<br />
<br />
<span style="font-size: large;"><b>Part 2</b> <b>Adding an interactive camera, depth of field and progressive rendering</b></span><br />
<br />
<a href="https://github.com/straaljager/OpenCL-path-tracing-tutorial-3-Part-2">https://github.com/straaljager/OpenCL-path-tracing-tutorial-3-Part-2</a><br />
<br />
<br />
<br />
Thanks to Erich Loftis and Brandon Miles for useful tips on improving the generation of random numbers in OpenCL to avoid the distracting artefacts (showing up as a sawtooth pattern) when using defocus blur (still not perfect but much better than before).<br />
<br />
The next tutorial will cover rendering of triangles and triangle meshes. Sam Laperehttp://www.blogger.com/profile/05688552048697970050noreply@blogger.com4tag:blogger.com,1999:blog-7277449027963623452.post-89447450498379080632016-11-28T23:31:00.002-08:002016-12-05T20:09:09.111-08:00Wanted: GPU rendering developers<div style="text-align: justify;">
I'm working for an international company with very large (<Trump voice><trump voice="">"YUUUUUGE"<\Trump voice>) industry partners.<br /><br />We are currently looking for excellent developers with experience in GPU rendering (path tracing) for a new project.<br /><br />Our ideal candidates have either a:</trump></div>
<ul style="text-align: justify;">
<li>Bachelor in Computer Science, Computer/Software Engineering or Physics with a minimum of 2 years of work experience in a relevant field, or</li>
<li>Master in Computer Science, Computer/Software Engineering or Physics, or</li>
<li>PhD in a relevant field</li>
</ul>
<div style="text-align: justify;">
and a strong interest in physically based rendering and ray tracing.</div>
<div style="text-align: justify;">
<br />
<br />
Self-taught programmers are encouraged to apply if they meet the following requirements:</div>
<ul style="text-align: justify;">
<li>you breathe rendering and have Monte Carlo simulations running through your blood</li>
<li>you have a copy of PBRT (<a href="http://www.pbrt.org/">www.pbrt.org</a>, version 3 was released just last week) on your bedside table</li>
<li>provable experience working with open source rendering frameworks such as PBRT, LuxRender, Cycles, AMD RadeonRays or with a commercial renderer will earn you extra brownie points</li>
<li>5+ years of experience with C++</li>
<li>experience with CUDA or OpenCL</li>
<li>experience with version control systems and working on large projects</li>
<li>proven rendering track record (publications, Github projects, blog)</li>
</ul>
<div style="text-align: justify;">
<br />
Other requirements:</div>
<ul style="text-align: justify;">
<li>insatiable hunger to innovate</li>
<li>a "can do" attitude</li>
<li>strong work ethic and focus on results</li>
<li>continuous self-learner</li>
<li>work well in a team</li>
<li>work independently and able to take direction</li>
<li>ability to communicate effectively</li>
<li>comfortable speaking English</li>
<li>own initiatives and original ideas are highly encouraged</li>
<li>willing to relocate to New Zealand</li>
</ul>
<div style="text-align: justify;">
<br />
What we offer:</div>
<ul style="text-align: justify;">
<li>unique location in one of the most beautiful and greenest countries in the world</li>
<li>be part of a small, high-performance team </li>
<li>competitive salary</li>
<li>jandals, marmite and hokey pokey ice cream</li>
</ul>
<div style="text-align: justify;">
<br />
For more information, contact me at sam.lapere@live.be<br />
<br />
If you are interested, send your CV and cover letter to sam.lapere@live.be. Applications will close on 16 December or when we find the right people. (update: spots are filling up quickly so we advanced the closing date with five days) </div>
Sam Laperehttp://www.blogger.com/profile/05688552048697970050noreply@blogger.com6tag:blogger.com,1999:blog-7277449027963623452.post-10273569477614153562016-11-14T22:57:00.001-08:002016-11-27T19:37:10.834-08:00OpenCL path tracing tutorial 2: path tracing spheres<div style="text-align: justify;">
This tutorial consists of two parts: the first part will describe how to ray trace one sphere using OpenCL, while the second part covers path tracing of a scene made of spheres. The tutorial will be light on ray tracing/path tracing theory (there are plenty of excellent resources available online such as Scratch-a-Pixel) and will focus instead on the practical implementation of rendering algorithms in OpenCL.The end result will be a rendered image featuring realistic light effects such as indirect lighting, diffuse colour bleeding and soft shadows, all achieved with just a few lines of code:</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjByv8FAv63VvuL8Jtw-le6wXqQbgtyCfbfiNqvNU2pC3edgZV8zUZ21TH4owNxB7SC7j1DOyT4YoT6rLAlOzRCS2sK0S4mGfEbWbhShesNvZmNqWFXAPkxxt70nblVHM-DJvTgklpmM4A/s1600/opencl_raytracer_2000samps.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="225" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjByv8FAv63VvuL8Jtw-le6wXqQbgtyCfbfiNqvNU2pC3edgZV8zUZ21TH4owNxB7SC7j1DOyT4YoT6rLAlOzRCS2sK0S4mGfEbWbhShesNvZmNqWFXAPkxxt70nblVHM-DJvTgklpmM4A/s400/opencl_raytracer_2000samps.png" width="400" /></a></div>
<br /></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<span style="color: #6fa8dc; font-size: large;"><b><span style="font-size: medium;">Part 1: Ray tracing a sphere</span></b></span></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<span style="color: #3d85c6; font-size: small;"><b>Computing a test image on the OpenCL device</b></span></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<span style="font-size: small;">The host (CPU) sets up the OpenCL environment and launches the OpenCL kernel which will be executed on the OpenCL device (GPU or CPU) in parallel. Each work item (or thread) on the device will calculate one pixel of the image. There will thus be as many work items in the global pool as there are pixels in the image. Each work item has a unique ID which distinguishes from all other work items in the global pool of threads and which is obtained with </span><span style="font-size: small;"><span style="color: #38761d;"><span style="color: #6fa8dc;"><b>get_global_id(0)</b></span></span>. </span></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<span style="font-size: small;">The X- and Y-coordinates of each pixel can be computed by using that pixel's unique work item ID: </span></div>
<ul style="text-align: justify;">
<li><span style="font-size: small;"><span style="font-size: small;">x-coordinate: </span>divide by the image width and take the remainder</span></li>
<li><span style="font-size: small;">y-coordinate: divide by the image width</span></li>
</ul>
<div style="text-align: justify;">
<span style="font-size: small;">By remapping the x and y coordinates from the [0 to width] range for x and [0 to height] range for y to the range [0 - 1] for both, and plugging those values in the red and green channels repsectively yields the following gradient image (the image is saved in ppm format which can be opened with e.g. IrfanView of Gimp):</span></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgpg5Mlx4mBIkh1iOE_WTT2ciwhz_DME-vm7L6Wy2uXRXeVT4jj2l0XUhfUuR8bqIeWXma7_vqo1XgVVWYujbSUQUFWi1vaksUNlI-TIe_XLWWtFyHnDt00nC6I0-BlTl5h-IqwLSpfJfI/s1600/opencl_simple_gradient.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="225" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgpg5Mlx4mBIkh1iOE_WTT2ciwhz_DME-vm7L6Wy2uXRXeVT4jj2l0XUhfUuR8bqIeWXma7_vqo1XgVVWYujbSUQUFWi1vaksUNlI-TIe_XLWWtFyHnDt00nC6I0-BlTl5h-IqwLSpfJfI/s400/opencl_simple_gradient.png" width="400" /></a></div>
<br /></div>
<div style="text-align: justify;">
<span style="font-size: small;"><span style="font-size: small;"><span style="font-size: small;">The OpenCL code to generate this image:</span></span> </span></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<span style="font-size: small;">
</span>
<br />
<div style="height: 16em; overflow: auto; width: 42em;">
<pre style="overflow-x: auto; width: 70em;"><!-- HTML generated using hilite.me --><div style="background: #000000; border-width: 0.1em 0.1em 0.1em 0.8em; border: solid gray; overflow: auto; padding: 0.2em 0.6em; width: auto;">
<table><tbody>
<tr><td><pre style="line-height: 125%; margin: 0;">1
2
3
4
5
6
7
8
9</pre>
</td><td><pre style="line-height: 125%; margin: 0;">__kernel <span style="color: #2b91af;">void</span> render_kernel(__global float3* output, <span style="color: #2b91af;">int</span> width, <span style="color: #2b91af;">int</span> height)
{
<span style="color: blue;">const</span> <span style="color: #2b91af;">int</span> work_item_id = get_global_id(0); <span style="color: green;">/* the unique global id of the work item for the current pixel */</span>
<span style="color: #2b91af;">int</span> x = work_item_id % width; <span style="color: green;">/* x-coordinate of the pixel */</span>
<span style="color: #2b91af;">int</span> y = work_item_id / width; <span style="color: green;">/* y-coordinate of the pixel */</span>
<span style="color: #2b91af;">float</span> fx = (<span style="color: #2b91af;">float</span>)x / (<span style="color: #2b91af;">float</span>)width; <span style="color: green;">/* convert int to float in range [0-1] */</span>
<span style="color: #2b91af;">float</span> fy = (<span style="color: #2b91af;">float</span>)y / (<span style="color: #2b91af;">float</span>)height; <span style="color: green;">/* convert int to float in range [0-1] */</span>
output[work_item_id] = (float3)(fx, fy, 0); <span style="color: green;">/* simple interpolated colour gradient based on pixel coordinates */</span>
}
</pre>
</td></tr>
</tbody></table>
</div>
<span style="font-size: small;">
<span style="font-size: small;">
</span></span></pre>
</div>
<span style="font-size: small;">
</span></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<span style="font-size: small;">Now let's use the OpenCL device for some ray tracing. </span></div>
<div style="text-align: justify;">
<br />
<br /></div>
<div style="text-align: justify;">
<span style="color: #3d85c6; font-size: small;"><b>Ray tracing a sphere with OpenCL</b></span></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<span style="font-size: small;">We first define a <span style="color: #ff6600;"><b>Ray</b></span> and a <span style="color: #ff6600;"><b>Sphere</b></span> struct in the OpenCL code:</span><br />
<br />
<div style="text-align: justify;">
<span style="font-size: small;">A <span style="color: #ff6600;"><b>Ray</b></span> has </span><br />
<ul>
<li><span style="font-size: small;">an <span style="color: #ff6600;"><b>origin</b></span> in 3D space (3 floats for x, y, z coordinates) </span></li>
<li><span style="font-size: small;">a <span style="color: #ff6600;"><b>direction</b></span> in 3D space (3 floats for the x, y, z coordinates of the 3D vector)</span></li>
</ul>
</div>
</div>
<div style="text-align: justify;">
<span style="font-size: small;">A <span style="color: #ff6600;"><b>Sphere</b></span> has </span><br />
<ul>
<li><span style="font-size: small;">a <span style="color: #ff6600;"><b>radius</b></span>, </span></li>
<li><span style="font-size: small;">a <span style="color: #ff6600;"><b>position</b></span> in 3D space (3 floats for x, y, z coordinates), </span></li>
<li><span style="font-size: small;">an object <span style="color: #ff6600;"><b>colour</b></span> (3 floats for the Red, Green and Blue channel) </span></li>
<li><span style="font-size: small;">an <span style="color: #ff6600;"><b>emission</b></span> colour (again 3 floats for each of the RGB channels)</span></li>
</ul>
</div>
<div style="text-align: justify;">
<br />
<div style="height: 18em; overflow: auto; width: 40em;">
<pre style="overflow-x: auto; width: 40em;"><!-- HTML generated using hilite.me --><div style="background: #000000; border-width: 0.1em 0.1em 0.1em 0.8em; border: solid gray; overflow: auto; padding: 0.2em 0.6em; width: auto;">
<table><tbody>
<tr><td><pre style="line-height: 125%; margin: 0;"> 1
2
3
4
5
6
7
8
9
10
11</pre>
</td><td><pre style="line-height: 125%; margin: 0;"><span style="color: blue;">struct</span> Ray{
float3 origin;
float3 dir;
};
<span style="color: blue;">struct</span> Sphere{
<span style="color: #2b91af;">float</span> radius;
float3 pos;
float3 emi;
float3 color;
};
</pre>
</td></tr>
</tbody></table>
</div>
</pre>
</div>
</div>
<div style="text-align: justify;">
<br />
<span style="font-size: small;"><span style="color: #3d85c6;"><b>Camera ray generation</b></span> </span></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<span style="font-size: small;">Rays are shot from the camera (which is in a fixed position for this tutorial) through an imaginary grid of pixels into the scene, where they intersect with 3D objects (in this case spheres). For each pixel in the image, we will generate one camera ray (also called primary rays, view rays or eye rays) and follow or trace it into the scene. For camera rays, the ray origin is the camera position and the ray direction is the vector connecting the camera and the pixel on the screen.</span></div>
<div style="text-align: justify;">
<br /></div>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh0WH3uuhYc4Tq51o6ux2YHQoMQvIFi2fw3k9EbL_IBLdHyw_T3EZnbrPm7RSuOh2a10_thZw8Orip55IMWG4E5EhzzJ7t61ESyPPw1aUxlhhFVdT5uJutjSamdUN_bamVLrQY55h47Zl8/s1600/Ray_trace_diagram_wiki.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="263" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh0WH3uuhYc4Tq51o6ux2YHQoMQvIFi2fw3k9EbL_IBLdHyw_T3EZnbrPm7RSuOh2a10_thZw8Orip55IMWG4E5EhzzJ7t61ESyPPw1aUxlhhFVdT5uJutjSamdUN_bamVLrQY55h47Zl8/s400/Ray_trace_diagram_wiki.png" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Source: Wikipedia</td></tr>
</tbody></table>
<div style="text-align: justify;">
<br />
<br /></div>
<div style="text-align: justify;">
<span style="font-size: small;">The OpenCL code for generating a camera ray:</span><br />
<br />
<span style="font-size: small;">
</span>
<br />
<div style="height: 32em; overflow: auto; width: 42em;">
<pre style="overflow-x: auto; width: 70em;"><!-- HTML generated using hilite.me --><div style="background: #000000; border-width: 0.1em 0.1em 0.1em 0.8em; border: solid gray; overflow: auto; padding: 0.2em 0.6em; width: auto;">
<table><tbody>
<tr><td><pre style="line-height: 125%; margin: 0;"> 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20</pre>
</td><td><pre style="line-height: 125%; margin: 0;"><span style="color: blue;">struct</span> Ray createCamRay(<span style="color: blue;">const</span> <span style="color: #2b91af;">int</span> x_coord, <span style="color: blue;">const</span> <span style="color: #2b91af;">int</span> y_coord, <span style="color: blue;">const</span> <span style="color: #2b91af;">int</span> width, <span style="color: blue;">const</span> <span style="color: #2b91af;">int</span> height){
<span style="color: #2b91af;">float</span> fx = (<span style="color: #2b91af;">float</span>)x_coord / (<span style="color: #2b91af;">float</span>)width; <span style="color: green;">/* convert int in range [0 - width] to float in range [0-1] */</span>
<span style="color: #2b91af;">float</span> fy = (<span style="color: #2b91af;">float</span>)y_coord / (<span style="color: #2b91af;">float</span>)height; <span style="color: green;">/* convert int in range [0 - height] to float in range [0-1] */</span>
<span style="color: green;">/* calculate aspect ratio */</span>
<span style="color: #2b91af;">float</span> aspect_ratio = (<span style="color: #2b91af;">float</span>)(width) / (<span style="color: #2b91af;">float</span>)(height);
<span style="color: #2b91af;">float</span> fx2 = (fx - 0.5f) * aspect_ratio;
<span style="color: #2b91af;">float</span> fy2 = fy - 0.5f;
<span style="color: green;">/* determine position of pixel on screen */</span>
float3 pixel_pos = (float3)(fx2, -fy2, 0.0f);
<span style="color: green;">/* create camera ray*/</span>
<span style="color: blue;">struct</span> Ray ray;
ray.origin = (float3)(0.0f, 0.0f, 40.0f); <span style="color: green;">/* fixed camera position */</span>
ray.dir = normalize(pixel_pos - ray.origin);
<span style="color: blue;">return</span> ray;
}
</pre>
</td></tr>
</tbody></table>
</div>
<span style="font-size: small;">
</span></pre>
</div>
<span style="font-size: small;">
</span></div>
<div style="text-align: justify;">
<span style="font-size: small;"><br /></span>
<span style="font-size: small;">
</span><br />
<span style="font-size: small;">
</span></div>
<div style="text-align: justify;">
<span style="font-size: small;"><br /></span>
<br />
<span style="color: #3d85c6;"><span style="font-size: small;"><b>Ray-sphere intersection</b></span></span></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<span style="font-size: small;">To find the intersection of a ray with a sphere, we need the parametric equation of a line, which denotes the <span style="color: #ff6600;"><b>distance from the ray origin to the intersection point along the ray direction</b></span> with the parameter <b>"t"</b></span>: </div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<span style="color: #ff6600;"><span style="font-size: small;"><b>intersection point = ray origin + ray direction * t</b></span></span></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<span style="font-size: small;">The equation of a sphere follows from the Pythagorean theorem in 3D (</span><span style="font-size: small;"><span style="font-size: small;">all points on the surface of a sphere</span> are located at a distance of radius <b>r</b> from its center): </span></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<span style="color: #ff6600;"><span style="font-size: small;"><b>(sphere surface point - sphere center)</b></span></span><span style="font-size: small;"><span style="color: #ff6600;"><b><span style="font-size: small;"><sup>2</sup></span> = radius<sup>2</sup></b></span> </span></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
In the case of a sphere centered at the origin (with coordinates [0,0,0]), the vector [sphere surface point - sphere center] reduces to the coordinates of a point on the sphere's surface (the intersection point). Combining both equations then gives<br />
<br />
<div style="text-align: justify;">
<b><span style="color: #ff6600;">(ray origin + ray direction * t)<sup>2</sup> = radius<sup>2</sup></span> </b><span style="font-size: small;"><b> </b></span></div>
</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<span style="font-size: small;">Expanding this equation in a <span style="color: #ff6600;"><b>quadratic equation</b></span> of the form <span style="color: #ff6600;"><b>ax</b></span></span><span style="font-size: small;"><span style="color: #ff6600;"><b><span style="font-size: small;"><span style="font-size: small;"><sup>2</sup></span></span> + bx + c = 0</b></span> where </span><br />
<ul>
<li><span style="font-size: small;">a = (ray direction) . (ray direction) </span><span style="font-size: small;"> </span></li>
<li><span style="font-size: small;">b = 2 * (ray direction) . (ray origin to sphere center)</span><span style="font-size: small;"> </span></li>
<li><span style="font-size: small;">c = </span><span style="font-size: small;"><span style="font-size: small;">(</span></span><span style="font-size: small;"><span style="font-size: small;"><span style="font-size: small;">ray origin to sphere center</span>) </span>. </span><span style="font-size: small;"><span style="font-size: small;">(</span></span><span style="font-size: small;"><span style="font-size: small;"><span style="font-size: small;">ray origin to sphere center</span>)</span> - </span><span style="font-size: small;"><span style="font-size: small;"><span style="font-size: small;">radius<sup>2</sup></span></span> </span></li>
</ul>
</div>
<div style="text-align: justify;">
</div>
<div style="text-align: justify;">
<span style="font-size: small;">yields solutions for <b>t</b> (the distance to the point where the ray intersects the sphere) given by the <span style="color: #ff6600;"><b>quadratic formula</b></span></span><span style="font-size: small;"> </span><span style="color: #ff6600;"><span style="font-size: small;">−b ±
√<span style="text-decoration: overline;"> b</span></span></span><span style="font-size: small;"><span style="color: #ff6600;"><span style="text-decoration: overline;"><span style="font-size: x-small;"><span style="font-size: x-small;"><sup>2</sup></span></span> −
4ac </span>/ 2a</span> (where <span style="color: #ff6600;"><b>b</b></span></span><b><span style="font-size: small;"><span style="color: #ff6600;"><span style="font-size: small;"><span style="font-size: small;"><span style="font-size: small;"><sup>2</sup></span></span></span> - 4ac</span> </span></b><span style="font-size: small;">is called the discriminant).</span><br />
<br />
Depending on whether the determinant is negative, zero or positive, there can be zero (ray misses sphere), one (ray just touches the sphere at one point) or two solutions (ray fully intersects the sphere at two points) respectively. The distance t can be positive (intersection in front of ray origin) or negative (intersection behind ray origin). The details of the mathematical derivation are explained <span style="font-size: small;">in <a href="http://www.scratchapixel.com/lessons/3d-basic-rendering/minimal-ray-tracer-rendering-simple-shapes/ray-sphere-intersection">this Scratch-a-Pixel article.</a></span><span style="font-size: small;"> </span></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<span style="font-size: small;">The ray-sphere intersection algorithm is optimised by omitting the "a" coefficient in the quadratic formula, because its value is the dot product of the normalised ray direction with itself which equals 1. Taking the square root of the discriminant (an expensive function) can only be performed when the discriminant is non-negative.</span></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<span style="font-size: small;">
</span>
<br />
<div style="height: 34em; overflow: auto; width: 42em;">
<pre style="overflow-x: auto; width: 70em;"><!-- HTML generated using hilite.me --><div style="background: #000000; border-width: 0.1em 0.1em 0.1em 0.8em; border: solid gray; overflow: auto; padding: 0.2em 0.6em; width: auto;">
<table><tbody>
<tr><td><pre style="line-height: 125%; margin: 0;"> 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23</pre>
</td><td><pre style="line-height: 125%; margin: 0;"><span style="color: #2b91af;">bool</span> intersect_sphere(<span style="color: blue;">const</span> <span style="color: blue;">struct</span> Sphere* sphere, <span style="color: blue;">const</span> <span style="color: blue;">struct</span> Ray* ray, <span style="color: #2b91af;">float</span>* t)
{
float3 rayToCenter = sphere->pos - ray->origin;
<span style="color: green;">/* calculate coefficients a, b, c from quadratic equation */</span>
<span style="color: green;">/* float a = dot(ray->dir, ray->dir); // ray direction is normalised, dotproduct simplifies to 1 */</span>
<span style="color: #2b91af;">float</span> b = dot(rayToCenter, ray->dir);
<span style="color: #2b91af;">float</span> c = dot(rayToCenter, rayToCenter) - sphere->radius*sphere->radius;
<span style="color: #2b91af;">float</span> disc = b * b - c; <span style="color: green;">/* discriminant of quadratic formula */</span>
<span style="color: green;">/* solve for t (distance to hitpoint along ray) */</span>
<span style="color: blue;">if</span> (disc < 0.0f) <span style="color: blue;">return</span> false;
<span style="color: blue;">else</span> *t = b - sqrt(disc);
<span style="color: blue;">if</span> (*t < 0.0f){
*t = b + sqrt(disc);
<span style="color: blue;">if</span> (*t < 0.0f) <span style="color: blue;">return</span> false;
}
<span style="color: blue;">else</span> <span style="color: blue;">return</span> true;
}
</pre>
</td></tr>
</tbody></table>
</div>
<span style="font-size: small;">
</span></pre>
</div>
<span style="font-size: small;">
</span></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<br />
<span style="color: #3d85c6;"><b>Scene initialisation</b></span></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<span style="font-size: small;">For simplicity, </span><span style="font-size: small;"><span style="font-size: small;">in this first part of the tutorial </span>the scene will be initialised on the device in the kernel function (in the second part the scene will be initialised on the host and passed to OpenCL which is more flexible and memory efficient, but also requires to be more careful with regards to memory alignment and the use of memory address spaces). Every work item will thus have a local copy of the scene (in this case one sphere).</span></div>
<br />
<div style="height: 42em; overflow: auto; width: 42em;">
<pre style="overflow-x: auto; width: 70em;"><!-- HTML generated using hilite.me --><div style="background: #000000; border-width: 0.1em 0.1em 0.1em 0.8em; border: solid gray; overflow: auto; padding: 0.2em 0.6em; width: auto;">
<table><tbody>
<tr><td><pre style="line-height: 125%; margin: 0;"> 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29</pre>
</td><td><pre style="line-height: 125%; margin: 0;">__kernel <span style="color: #2b91af;">void</span> render_kernel(__global float3* output, <span style="color: #2b91af;">int</span> width, <span style="color: #2b91af;">int</span> height)
{
<span style="color: blue;">const</span> <span style="color: #2b91af;">int</span> work_item_id = get_global_id(0); <span style="color: green;">/* the unique global id of the work item for the current pixel */</span>
<span style="color: #2b91af;">int</span> x_coord = work_item_id % width; <span style="color: green;">/* x-coordinate of the pixel */</span>
<span style="color: #2b91af;">int</span> y_coord = work_item_id / width; <span style="color: green;">/* y-coordinate of the pixel */</span>
<span style="color: green;">/* create a camera ray */</span>
<span style="color: blue;">struct</span> Ray camray = createCamRay(x_coord, y_coord, width, height);
<span style="color: green;">/* create and initialise a sphere */</span>
<span style="color: blue;">struct</span> Sphere sphere1;
sphere1.radius = 0.4f;
sphere1.pos = (float3)(0.0f, 0.0f, 3.0f);
sphere1.color = (float3)(0.9f, 0.3f, 0.0f);
<span style="color: green;">/* intersect ray with sphere */</span>
<span style="color: #2b91af;">float</span> t = 1e20;
intersect_sphere(&sphere1, &camray, &t);
<span style="color: green;">/* if ray misses sphere, return background colour </span>
<span style="color: green;"> background colour is a blue-ish gradient dependent on image height */</span>
<span style="color: blue;">if</span> (t > 1e19){
output[work_item_id] = (float3)(fy * 0.1f, fy * 0.3f, 0.3f);
<span style="color: blue;">return</span>;
}
<span style="color: green;">/* if ray hits the sphere, it will return the sphere colour*/</span>
output[work_item_id] = sphere1.color;
}
</pre>
</td></tr>
</tbody></table>
</div>
</pre>
</div>
<br />
<br />
<br />
<b><span style="color: #6fa8dc;">Running the ray tracer</span> </b> <br />
<br />
Now we've got everything we need to start ray tracing! Let's begin with a plain colour sphere. When the ray misses the sphere, the background colour is returned:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEipxQO4vEyoJqjE48NbxXaqf4LP4jfWZJEux4BnQs_dymUJsT8BDgvhnw2yubgcs4S1upoonfb8rC9qo6X7yBpOpRiOSgV8EKgbdYCpthuwVpbHJWQCc82ZwWWpYYJxEFG9G-lh6roTpBo/s1600/opencl_raytracer1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="225" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEipxQO4vEyoJqjE48NbxXaqf4LP4jfWZJEux4BnQs_dymUJsT8BDgvhnw2yubgcs4S1upoonfb8rC9qo6X7yBpOpRiOSgV8EKgbdYCpthuwVpbHJWQCc82ZwWWpYYJxEFG9G-lh6roTpBo/s400/opencl_raytracer1.png" width="400" /></a></div>
<br />
A more interesting sphere with cosine-weighted colours, giving the impression of front lighting.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiJ_OV7BkKcwxHz2ajnTHThyptAs41TQCctow47_UY2WqoIPst-K6DAiAjk7W30wQL94q9KuC37DNgIFHxys_K2mrDWUdFBZtJdhRToEm9Y31ihr-kvy4gVVupgKKH-Gsj_QfNSFKWsFAg/s1600/opencl_raytracer2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="225" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiJ_OV7BkKcwxHz2ajnTHThyptAs41TQCctow47_UY2WqoIPst-K6DAiAjk7W30wQL94q9KuC37DNgIFHxys_K2mrDWUdFBZtJdhRToEm9Y31ihr-kvy4gVVupgKKH-Gsj_QfNSFKWsFAg/s400/opencl_raytracer2.png" width="400" /></a></div>
<br />
To achieve this effect we need to calculate the angle between the ray hitting the sphere surface and the normal at that point. The sphere normal at a specific intersection point on the surface is just the normalised vector (with unit length) going from the sphere center to that intersection point. <br />
<br />
<div style="height: 10em; overflow: auto; width: 42em;">
<pre style="overflow-x: auto; width: 70em;"><!-- HTML generated using hilite.me --><div style="background: #000000; border-width: 0.1em 0.1em 0.1em 0.8em; border: solid gray; overflow: auto; padding: 0.2em 0.6em; width: auto;">
<table><tbody>
<tr><td><pre style="line-height: 125%; margin: 0;">1
2
3
4
5</pre>
</td><td><pre style="line-height: 125%; margin: 0;"> float3 hitpoint = camray.origin + camray.dir * t;
float3 normal = normalize(hitpoint - sphere1.pos);
<span style="color: #2b91af;">float</span> cosine_factor = dot(normal, camray.dir) * -1.0f;
output[work_item_id] = sphere1.color * cosine_factor;
</pre>
</td></tr>
</tbody></table>
</div>
</pre>
</div>
<br />
<br />
Adding some stripe pattern by multiplying the colour with the sine of the height:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjT-v5dxDBHcD1wAa2_676CgqA6xozgiGjlBvcnGCd9GInlUnCY6vu7VW5UBt-r7qFUTNG18xODu1vRGmj5NaM26PC7KnBMs8cGgygMdjV32a4BQSTmVTyGr9u38sMj6ucgE2ewJTjxEvM/s1600/opencl_raytracer3.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="225" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjT-v5dxDBHcD1wAa2_676CgqA6xozgiGjlBvcnGCd9GInlUnCY6vu7VW5UBt-r7qFUTNG18xODu1vRGmj5NaM26PC7KnBMs8cGgygMdjV32a4BQSTmVTyGr9u38sMj6ucgE2ewJTjxEvM/s400/opencl_raytracer3.png" width="400" /></a></div>
<br />
Screen-door effect using sine functions for both x and y-directions<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh8c5h3p5GDBWg-3-LZyyR8nPAUKxEy4PycuAMIwA2HnXWdRhOshmz82AwxEL13-yawDxhzFbTrTzR1TYh93f7QtvC3q_8Gwh7nuVSB8Fr93QpfPkHiab2m09GfCGSMvM3NFon-ti0ubQk/s1600/opencl_raytracer4.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="225" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh8c5h3p5GDBWg-3-LZyyR8nPAUKxEy4PycuAMIwA2HnXWdRhOshmz82AwxEL13-yawDxhzFbTrTzR1TYh93f7QtvC3q_8Gwh7nuVSB8Fr93QpfPkHiab2m09GfCGSMvM3NFon-ti0ubQk/s400/opencl_raytracer4.png" width="400" /></a></div>
<br />
Showing the surface normals (calculated in the code snippet above) as colours:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhd3Yq-DCTkd4f2Mn-EoxvpazkeRDm_GTmabsAOJLK8qf2daoq4wxGYobL4hidMmkR6zf736-nMWAP-KO-7A7pSR0VFq7spBMO5MRDE8OOWGrIOLHJzCUhCmA9SiliwkoRulSQq5_N5pjE/s1600/opencl_raytracer5.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="225" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhd3Yq-DCTkd4f2Mn-EoxvpazkeRDm_GTmabsAOJLK8qf2daoq4wxGYobL4hidMmkR6zf736-nMWAP-KO-7A7pSR0VFq7spBMO5MRDE8OOWGrIOLHJzCUhCmA9SiliwkoRulSQq5_N5pjE/s400/opencl_raytracer5.png" width="400" /></a></div>
<br />
<br />
<span style="color: #6fa8dc;"><span style="font-size: large;"><b>Source code </b></span></span><br />
<br />
<a href="https://github.com/straaljager/OpenCL-path-tracing-tutorial-2-Part-1-Raytracing-a-sphere">https://github.com/straaljager/OpenCL-path-tracing-tutorial-2-Part-1-Raytracing-a-sphere</a><br />
<br />
<br />
<span style="color: #6fa8dc;"><span style="font-size: large;"><b>Download demo</b></span></span> (works on AMD, Nvidia and Intel) <br />
<br />
The executable demo will render the above images.<br />
<br />
<a href="https://github.com/straaljager/OpenCL-path-tracing-tutorial-2-Part-1-Raytracing-a-sphere/releases/tag/1.0">https://github.com/straaljager/OpenCL-path-tracing-tutorial-2-Part-1-Raytracing-a-sphere/releases/tag/1.0</a><br />
<br />
<br />
<br />
<div style="text-align: justify;">
<span style="color: #6fa8dc; font-size: large;"><span style="font-size: large;"><b>Part 2: Path tracing spheres</b></span></span></div>
<div style="text-align: justify;">
<span style="font-size: large;"><br /></span></div>
<div style="text-align: justify;">
<span style="color: #3d85c6; font-size: small;"><b>Very quick overview of ray tracing and path tracing</b></span></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
The following section covers the background of the ray tracing process in a very simplified way, but should be sufficient to understand the code in this tutorial. Scratch-a-Pixel provides a much more detailed explanation of ray tracing. <b> </b></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<span style="color: #ff6600;"><b>Ray tracing</b></span> is a general term that encompasses ray casting, Whitted ray tracing, distribution ray tracing and path tracing. So far, we have only traced rays from the camera (so called "camera rays", "eye rays" or "<span style="color: #ff6600;"><b>primary rays</b></span>") into the scene, a process called <span style="color: #ff6600;"><b>ray casting</b></span>, resulting in plainly coloured images with no lighting. In order to achieve effects like shadows and reflections, new rays must be generated at the points where the camera rays intersect with the scene. These <span style="color: #ff6600;"><b>secondary rays</b></span> can be shadow rays, reflection rays, transmission rays (for refractions), ambient occlusion rays or diffuse interreflection rays (for indirect lighting/global illumination). For example, shadow rays used for direct lighting are generated to point directly towards a light source while reflection rays are pointed in (or near) the direction of the reflection vector. For now we will skip direct lighting to generate shadows and go straight to path tracing, which is strangely enough easier to code, creates more realistic and prettier pictures and is just more fun.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
In (plain) <span style="color: #ff6600;"><b>path tracing</b></span>, rays are shot from the camera and bounce off the surface of scene objects in a <b><span style="color: #ff6600;">random direction</span></b> (like a high-energy bouncing ball), forming a <b><span style="color: #ff6600;">chain of random rays connected together into a path</span></b>. If the path hits a light emitting object such as a light source, it will return a colour which depends on the surface colours of all the objects encountered so far along the path, the colour of the light emitters, the angles at which the path hit a surface and the angles at which the path bounced off a surface. These ideas form the essence of the "<span style="color: #ff6600;"><b>rendering equation</b></span>", proposed in a paper with the same name by Jim Kajiya in 1986.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
Since the directions of the rays in a path are generated randomly, some paths will hit a light source while others won't, resulting in noise ("variance" in statistics due to random sampling). The noise can be reduced by shooting many random paths per pixel (= taking many <span style="color: #ff6600;"><b>samples</b></span>) and averaging the results.</div>
<div style="text-align: justify;">
<br />
<br /></div>
<div style="text-align: justify;">
<span style="color: #6fa8dc; font-size: small;"><span style="font-size: medium;"><b>Implementation of (plain) path tracing in OpenCL</b></span></span> </div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
The code for the path tracer is based on smallpt from Kevin Beason and is largely the same as the ray tracer code from part 1 of this tutorial, with some important differences on the host side:</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
- the scene is initialised on the host (CPU) side, which requires a host version of the Sphere struct. Correct <span style="color: #ff6600;"><b>memory alignment</b></span> in the host struct is very important to avoid shifting of values and wrongly initialised variables in the OpenCL struct, especially when using OpenCL's built-in data types such as float3 and float4. If necessary, the struct should be padded with dummy variables to ensure memory alignment (the total size of the struct must be a multiple of the size of float4).</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<div style="height: 17em; overflow: auto; width: 42em;">
<pre style="overflow-x: auto; width: 70em;"><!-- HTML generated using hilite.me --><div style="background: #000000; border-width: 0.1em 0.1em 0.1em 0.8em; border: solid gray; overflow: auto; padding: 0.2em 0.6em; width: auto;">
<table><tbody>
<tr><td><pre style="line-height: 125%; margin: 0;"> 1
2
3
4
5
6
7
8
9
10</pre>
</td><td><pre style="line-height: 125%; margin: 0;"><span style="color: blue;">struct</span> Sphere
{
cl_float radius;
cl_float dummy1;
cl_float dummy2;
cl_float dummy3;
cl_float3 position;
cl_float3 color;
cl_float3 emission;
};
</pre>
</td></tr>
</tbody></table>
</div>
</pre>
</div>
</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
- the scene (an array of spheres) is copied from the host to the OpenCL device into global memory (using <span style="color: #ff6600;"><b>CL_MEM_READ_WRITE</b></span>) or constant memory (using <span style="color: #ff6600;"><b>CL_MEM_READ_ONLY</b></span>) </div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<div style="height: 17em; overflow: auto; width: 42em;">
<pre style="overflow-x: auto; width: 70em;"><!-- HTML generated using hilite.me --><div style="background: #000000; border-width: 0.1em 0.1em 0.1em 0.8em; border: solid gray; overflow: auto; padding: 0.2em 0.6em; width: auto;">
<table><tbody>
<tr><td><pre style="line-height: 125%; margin: 0;">1
2
3
4
5
6
7
8
9</pre>
</td><td><pre style="line-height: 125%; margin: 0;"><span style="color: green;">// initialise scene</span>
<span style="color: blue;">const</span> <span style="color: #2b91af;">int</span> sphere_count = 9;
Sphere cpu_spheres[sphere_count];
initScene(cpu_spheres);
<span style="color: green;">// Create buffers on the OpenCL device for the image and the scene</span>
cl_output = Buffer(context, CL_MEM_WRITE_ONLY, image_width * image_height * <span style="color: blue;">sizeof</span>(cl_float3));
cl_spheres = Buffer(context, CL_MEM_READ_ONLY, sphere_count * <span style="color: blue;">sizeof</span>(Sphere));
queue.enqueueWriteBuffer(cl_spheres, CL_TRUE, 0, sphere_count * <span style="color: blue;">sizeof</span>(Sphere), cpu_spheres);
</pre>
</td></tr>
</tbody></table>
</div>
</pre>
</div>
</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
- explicit memory management: once the scene is on the device, its pointer can be passed on to other device functions preceded by the keyword "<span style="color: #ff6600;"><b>__global</b></span>" or "<span style="color: #ff6600;"><b>__constant</b></span>".</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
- the host code automatically determines the local size of the kernel work group (the number of work items or "threads" per work group) by calling the OpenCL function <b>kernel.getWorkGroupInfo<cl_kernel_work_group_size>(device) </cl_kernel_work_group_size></b></div>
<div style="text-align: justify;">
<br />
<br />
<span style="color: #6fa8dc; font-size: small;"><span style="font-size: medium;"><b>The actual path tracing function</b></span></span><br />
<br /></div>
<div style="text-align: justify;">
- <span style="color: #ff6600;"><b>iterative path tracing function</b></span>: since OpenCL does not support recursion, the trace() function traces paths iteratively (instead of recursively) using a loop with a fixed number of bounces (iterations), representing path depth.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
- each path starts off with an "<span style="color: #ff6600;"><b>accumulated colour</b></span>" initialised to black and a "<span style="color: #ff6600;"><b>mask colour</b></span>" initialised to pure white. The mask colour "collects" surface colours along its path by multiplication. The accumulated colour accumulates light from emitters along its path by adding emitted colours multiplied by the mask colour. </div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
- <span style="color: #ff6600;"><b>generating random ray directions</b></span>: new rays start at the hitpoint and get shot in a random direction by sampling a random point on the hemisphere above the surface hitpoint. For each new ray, a local orthogonal uvw-coordinate system and two random numbers are generated: one to pick a random value on the horizon for the azimuth, the other for the altitude (with the zenith being the highest point) </div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
- <span style="color: #ff6600;"><b>diffuse materials</b></span>: the code for this tutorial only supports diffuse materials, which reflect incident light almost uniformly in all directions (in the hemisphere above the hitpoint)</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
- <b><span style="color: #ff6600;">cosine-weighted importance sampling</span>: </b>because diffuse light reflection is not truly uniform, the light contribution from rays that are pointing away from the surface plane and closer to the surface normal is greater. Cosine-weighted importance sampling favours rays that are pointing away from the surface plane by multiplying their colour with the cosine of the angle between the surface normal and the ray direction.<br />
<br />
- while ray tracing can get away with tracing only one ray per pixel to render a good image (more are needed for anti-aliasing and blurry effects like depth-of-field and glossy reflections), the inherently noisy nature of path tracing requires tracing of <span style="color: #ff6600;"><b>many paths per pixel</b></span> (samples per pixel) and averaging the results to reduce noise to an acceptable level.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<div style="height: 50em; overflow: auto; width: 42em;">
<pre style="overflow-x: auto; width: 70em;"><!-- HTML generated using hilite.me --><div style="background: #000000; border-width: 0.1em 0.1em 0.1em 0.8em; border: solid gray; overflow: auto; padding: 0.2em 0.6em; width: auto;">
<table><tbody>
<tr><td><pre style="line-height: 125%; margin: 0;"> 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56</pre>
</td><td><pre style="line-height: 125%; margin: 0;">float3 trace(__constant Sphere* spheres, <span style="color: blue;">const</span> Ray* camray, <span style="color: blue;">const</span> <span style="color: #2b91af;">int</span> sphere_count, <span style="color: blue;">const</span> <span style="color: #2b91af;">int</span>* seed0, <span style="color: blue;">const</span> <span style="color: #2b91af;">int</span>* seed1){
Ray ray = *camray;
float3 accum_color = (float3)(0.0f, 0.0f, 0.0f);
float3 mask = (float3)(1.0f, 1.0f, 1.0f);
<span style="color: blue;">for</span> (<span style="color: #2b91af;">int</span> bounces = 0; bounces < 8; bounces++){
<span style="color: #2b91af;">float</span> t; <span style="color: green;">/* distance to intersection */</span>
<span style="color: #2b91af;">int</span> hitsphere_id = 0; <span style="color: green;">/* index of intersected sphere */</span>
<span style="color: green;">/* if ray misses scene, return background colour */</span>
<span style="color: blue;">if</span> (!intersect_scene(spheres, &ray, &t, &hitsphere_id, sphere_count))
<span style="color: blue;">return</span> accum_color += mask * (float3)(0.15f, 0.15f, 0.25f);
<span style="color: green;">/* else, we've got a hit! Fetch the closest hit sphere */</span>
Sphere hitsphere = spheres[hitsphere_id]; <span style="color: green;">/* version with local copy of sphere */</span>
<span style="color: green;">/* compute the hitpoint using the ray equation */</span>
float3 hitpoint = ray.origin + ray.dir * t;
<span style="color: green;">/* compute the surface normal and flip it if necessary to face the incoming ray */</span>
float3 normal = normalize(hitpoint - hitsphere.pos);
float3 normal_facing = dot(normal, ray.dir) < 0.0f ? normal : normal * (-1.0f);
<span style="color: green;">/* compute two random numbers to pick a random point on the hemisphere above the hitpoint*/</span>
<span style="color: #2b91af;">float</span> rand1 = 2.0f * PI * get_random(seed0, seed1);
<span style="color: #2b91af;">float</span> rand2 = get_random(seed0, seed1);
<span style="color: #2b91af;">float</span> rand2s = sqrt(rand2);
<span style="color: green;">/* create a local orthogonal coordinate frame centered at the hitpoint */</span>
float3 w = normal_facing;
float3 axis = fabs(w.x) > 0.1f ? (float3)(0.0f, 1.0f, 0.0f) : (float3)(1.0f, 0.0f, 0.0f);
float3 u = normalize(cross(axis, w));
float3 v = cross(w, u);
<span style="color: green;">/* use the coordinte frame and random numbers to compute the next ray direction */</span>
float3 newdir = normalize(u * cos(rand1)*rand2s + v*sin(rand1)*rand2s + w*sqrt(1.0f - rand2));
<span style="color: green;">/* add a very small offset to the hitpoint to prevent self intersection */</span>
ray.origin = hitpoint + normal_facing * EPSILON;
ray.dir = newdir;
<span style="color: green;">/* add the colour and light contributions to the accumulated colour */</span>
accum_color += mask * hitsphere.emission;
<span style="color: green;">/* the mask colour picks up surface colours at each bounce */</span>
mask *= hitsphere.color;
<span style="color: green;">/* perform cosine-weighted importance sampling for diffuse surfaces*/</span>
mask *= dot(newdir, normal_facing);
}
<span style="color: blue;">return</span> accum_color;
}
</pre>
</td></tr>
</tbody></table>
</div>
</pre>
</div>
</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<br />
<br />
A screenshot made with the code above (also see the screenshot at the top of this post). Notice the colour bleeding (bounced colour reflected from the floor onto the spheres), soft shadows and lighting coming from the background.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgS8joDZ17tHCZGxnTpNiApiol9Rt9fiDO8g1ffsRrIh66uWfbLMLE0Xfq6WwUEUaMJQTB1ce6NQQxcu32XcljYck6vxY1iG94xwstoB3hMQgXC8My7WsA_uxpe0r6vhwqMrUljWkut7N4/s1600/opencl_pathracer_spheres.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="225" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgS8joDZ17tHCZGxnTpNiApiol9Rt9fiDO8g1ffsRrIh66uWfbLMLE0Xfq6WwUEUaMJQTB1ce6NQQxcu32XcljYck6vxY1iG94xwstoB3hMQgXC8My7WsA_uxpe0r6vhwqMrUljWkut7N4/s400/opencl_pathracer_spheres.png" width="400" /></a></div>
<br /></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<span style="color: #6fa8dc;"><span style="font-size: large;"><b>Source code </b></span></span></div>
<div style="text-align: justify;">
<br />
<a href="https://github.com/straaljager/OpenCL-path-tracing-tutorial-2-Part-2-Path-tracing-spheres">https://github.com/straaljager/OpenCL-path-tracing-tutorial-2-Part-2-Path-tracing-spheres</a><br />
<br />
<br />
<span style="color: #6fa8dc;"><span style="font-size: large;"><b>Downloadable demo</b></span></span> (for AMD, Nvidia and Intel platforms, Windows only)<br />
<br />
<a href="https://github.com/straaljager/OpenCL-path-tracing-tutorial-2-Part-2-Path-tracing-spheres/releases/tag/1.0">https://github.com/straaljager/OpenCL-path-tracing-tutorial-2-Part-2-Path-tracing-spheres/releases/tag/1.0</a><br />
<br />
<br />
<b><span style="color: #6fa8dc;"><span style="font-size: large;">Useful resources</span></span></b><br />
<br />
- <a href="http://www.scratchapixel.com/">Scratch-a-pixel</a> is an excellent free online resource to learn about the theory behind ray tracing and path tracing. Many code samples (in C++) are also provided. <a href="http://www.scratchapixel.com/lessons/3d-basic-rendering/global-illumination-path-tracing">This article</a> gives a great introduction to global illumination and path tracing.<br />
<br />
- <a href="http://www.kevinbeason.com/smallpt/">smallpt by Kevin Beason</a> is a great little CPU path tracer in 100 lines code. It of formed the inspiration for the Cornell box scene and for many parts of the OpenCL code </div>
<div style="text-align: justify;">
<br />
<br /></div>
<div style="text-align: justify;">
<span style="color: #6fa8dc;"><span style="font-size: large;"><b>Up next</b></span></span></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
The next tutorial will cover the implementation of an interactive OpenGL viewport with a progressively refining image and an interactive camera with anti-aliasing and depth-of-field.</div>
Sam Laperehttp://www.blogger.com/profile/05688552048697970050noreply@blogger.com13tag:blogger.com,1999:blog-7277449027963623452.post-25430520884405365602016-11-01T21:57:00.002-07:002016-11-03T23:28:04.943-07:00OpenCL path tracing tutorial 1: Firing up OpenCL<div style="text-align: justify;">
This is the first tutorial in a new series of GPU path tracing tutorials which will focus on OpenCL based rendering. The first few tutorials will cover the very basics of getting started
with OpenCL and OpenCL based ray tracing and path tracing of simple
scenes. Follow-up tutorials will use a cut-down version of <a href="http://developer.amd.com/tools-and-sdks/graphics-development/radeonpro/radeonrays-technology-developers/">AMD's RadeonRays framework</a> (the framework formerly known as FireRays), to start from as a basis to add new features in a modular manner. The goal is to incrementally work up to include all the features of RadeonRays, a full-featured GPU path tracer. The <a href="https://github.com/GPUOpen-LibrariesAndSDKs/RadeonRays_SDK">Radeon Rays source</a> also forms the basis of AMD's Radeon ProRender Technology (which will also be integrated as a native GPU renderer in an <a href="https://www.maxon.net/en/news/maxon-blog/article/the-future-of-rendering/">upcoming version of Maxon's Cinema4D</a>). In the end, developers that are new to rendering should be able to code up their own GPU renderer and integrate it into their application. </div>
<br />
<br />
<div style="text-align: justify;">
<span style="color: #6fa8dc;"><b><span style="font-size: large;">Why OpenCL?</span></b></span> </div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
The major benefit of OpenCL is its platform independence, meaning that the same code can run on CPUs and GPUs made by AMD, Nvidia and Intel (in theory at least, in practice there are quite a few implementation differences between the various platforms). The tutorials in this series should thus run on any PC, regardless of GPU vendor (moreover a GPU is not even required to run the program). </div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
Another advantage of OpenCL is that it can use all the available CPU and GPUs in a system simultaneously to accelerate parallel workloads (such as rendering or physics simulations).</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
In order to achieve this flexibility, some boiler plate code is required which selects an OpenCL platform (e.g. AMD or Nvidia) and one or more OpenCL devices (CPUs or GPUs). In addition, the OpenCL source must be compiled at runtime (unless the platform and device are known in advance), which adds some initialisation time when the program is first run.</div>
<div style="text-align: justify;">
<br />
<br /></div>
<div style="text-align: justify;">
<span style="color: #6fa8dc;"><b><span style="font-size: large;">OpenCL execution model quick overview</span></b></span></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
This is a superquick overview OpenCL execution model, just enough to get started (there are plenty of more exhaustive sources on OpenCL available on the web). </div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
In order to run an OpenCL program, the following structures are required (and are provided by the OpenCL API):</div>
<ul style="text-align: justify;">
<li><span style="color: #6fa8dc;"><b>Platform</b></span>: which vendor (AMD/Nvidia/Intel)</li>
<li><span style="color: #6fa8dc;"><b>Device</b></span>: CPU, GPU, APU or integrated GPU</li>
<li><span style="color: #6fa8dc;"><b>Context</b></span>: the runtime interface between the host (CPU) and device (GPU or CPU) which manages all the OpenCL resources (programs, kernels, command queue, buffers). It receives and distributes kernels and transfers data.</li>
<li><span style="color: #6fa8dc;"><b>Program</b></span>: the entire OpenCL program (one or more kernels and device functions)</li>
<li><span style="color: #6fa8dc;"><b>Kernel</b></span>: the starting point into the OpenCL program, analogous to the main() function in a CPU program. Kernels are called from the host (CPU). They represent the basic units of executable code that run on an OpenCL device and are preceded by the keyword "__kernel"</li>
<li><span style="color: #6fa8dc;"><b>Command queue</b></span>: the command queue allows kernel execution commands to be sent to the device (execution can be in-order or out-of-order)</li>
<li><b><span style="color: #6fa8dc;">Memory objects</span></b>: buffers and images </li>
</ul>
<div style="text-align: justify;">
These structures are summarised in the diagram below (slide from AMD's <a href="http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2013/01/Introduction_to_OpenCL_Programming-201005.pdf">Introduction to OpenCL programming</a>): </div>
<div style="text-align: justify;">
<br /></div>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiVEHkkZ7bIl67frhKpmVLhBQ2OnR3ZDJPEOJS1Bm1-bkcVYoUZOSl7DpySMkGkg5K4aCkJBCjZkF9tgf7SfZVRd5muJutGxmeZAsRvtqo5jgXvDdRVhFD2YSQHCHv7Jch_74lX9tysp5Q/s1600/opencl_program_flow.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="295" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiVEHkkZ7bIl67frhKpmVLhBQ2OnR3ZDJPEOJS1Bm1-bkcVYoUZOSl7DpySMkGkg5K4aCkJBCjZkF9tgf7SfZVRd5muJutGxmeZAsRvtqo5jgXvDdRVhFD2YSQHCHv7Jch_74lX9tysp5Q/s400/opencl_program_flow.png" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">OpenCL execution model </td></tr>
</tbody></table>
<div style="text-align: center;">
<br /></div>
<div style="text-align: justify;">
<span style="color: #6fa8dc;"><b><span style="font-size: large;">OpenCL memory model quick overview</span></b></span></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
The full details of the memory model are beyond the scope of this first tutorial, but we'll cover the basics here to get some understanding on how a kernel is executed on the device. </div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
There are four levels of memory on an OpenCL device, forming a memory hierarchy (from large and slow to tiny and fast memory):</div>
<ul style="text-align: justify;">
<li><b><span style="color: #6fa8dc;">Global memory</span> (similar to RAM)</b>: the largest but also slowest form of memory, can be read and written to by all work items (threads) and all work groups on the device and can also be read/written by the host (CPU). </li>
<li><span style="color: #6fa8dc;"><b>Constant memory</b></span>: a small chunk of global memory on the device, can be read by all work items on the device (but not written to) and can be read/written by the host. Constant memory is slightly faster than global memory.</li>
<li><b><span style="color: #6fa8dc;">Local memory</span> (similar to cache memory on the CPU)</b>: memory shared among work items in the same work group (work items executing together on the same compute unit are grouped into work groups). Local memory allows work items belonging to the same work group to share results. Local memory is much faster than global memory (up to 100x). </li>
<li><b><span style="color: #6fa8dc;">Private memory</span> (similar to registers on the CPU)</b>: the fastest type of memory. Each work item (thread) has a tiny amount of private memory to store intermediate results that can only be used by that work item</li>
</ul>
<div style="text-align: center;">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjjWQrpjCDHrx5PBimDOVQMh3NtHqJwOlLeCOAFPUwak3JTvefzw_MSTJGosA1UnDp0fRlH5vOrHIF33WPKFrOfTvABybBjvagedl8j8Dyvseilk-fTkkTAFTOnsf3p9cFVAU1TyokEp6w/s1600/opencl_mem_model.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="257" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjjWQrpjCDHrx5PBimDOVQMh3NtHqJwOlLeCOAFPUwak3JTvefzw_MSTJGosA1UnDp0fRlH5vOrHIF33WPKFrOfTvABybBjvagedl8j8Dyvseilk-fTkkTAFTOnsf3p9cFVAU1TyokEp6w/s400/opencl_mem_model.png" width="400" /></a></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<br />
<span style="color: #6fa8dc;"><span style="font-size: large;"><b>First OpenCL program</b></span></span></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
With the obligatory theory out of the way, it's time to dive into the code. To get used to the OpenCL syntax, this first program will be very simple (nothing earth shattering yet): the code will just add the corresponding elements of two floating number arrays together in parallel (all at once).<br />
<br />
In a nutshell, what happens is the following:<br />
<ol>
<li>Initialise the OpenCL computing environment: create a platform, device, context, command queue, program and kernel and set up the kernel arguments</li>
<li>Create two floating point number arrays on the host side and copy them to the OpenCL device</li>
<li>Make OpenCL perform the computation in parallel (by determining global and local worksizes and launching the kernel)</li>
<li>Copy the results of the computation from the device to the host</li>
<li>Print the results to the console</li>
</ol>
To keep the code simple and readable, there is minimal error checking, the "cl" namespace is used for the OpenCL structures and the OpenCL kernel source is provided as a string in the CPU code. </div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
The code contains plenty of comments to clarify the new syntax:</div>
<div style="text-align: justify;">
<br /></div>
<div style="height: 50em; overflow: auto; width: 42em;">
<pre class="prettyprint" style="overflow-x: auto; width: 70em;"><!-- HTML generated using hilite.me --><div style="background: #181818; border-width: 0.1em 0.1em 0.1em 0.8em; border: solid gray; overflow: auto; padding: 0.2em 0.6em; width: auto;">
<table><tbody>
<tr><td><pre style="line-height: 125%; margin: 0;"> 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137</pre>
</td><td><pre style="line-height: 125%; margin: 0;"><span style="color: #00cc00;">// Getting started with OpenCL tutorial </span>
<span style="color: #00cc00;">// by Sam Lapere, 2016, http://raytracey.blogspot.com</span>
<span style="color: #00cc00;">// Code based on http://simpleopencl.blogspot.com/2013/06/tutorial-simple-start-with-opencl-and-c.html</span>
<span style="color: blue;">#include <iostream></span>
<span style="color: blue;">#include <vector></span>
<span style="color: blue;">#include <CL\cl.hpp> </span><span style="color: #00cc00;">// main OpenCL include file </span>
<span style="color: blue;">using</span> <span style="color: blue;">namespace</span> cl;
<span style="color: blue;">using</span> <span style="color: blue;">namespace</span> std;
<span style="color: #2b91af;">void</span> main()
{
<span style="color: #00cc00;">// Find all available OpenCL platforms (e.g. AMD, Nvidia, Intel)</span>
vector<Platform> platforms;
Platform::get(&platforms);
<span style="color: #00cc00;">// Show the names of all available OpenCL platforms</span>
cout << <span style="color: #e60000;">"Available OpenCL platforms: \n\n"</span>;
<span style="color: blue;">for</span> (<span style="color: #2b91af;">unsigned</span> <span style="color: #2b91af;">int</span> i = 0; i < platforms.size(); i++)
cout << <span style="color: #e60000;">"\t"</span> << i + 1 << <span style="color: #e60000;">": "</span> << platforms[i].getInfo<CL_PLATFORM_NAME>() << endl;
<span style="color: #00cc00;">// Choose and create an OpenCL platform</span>
cout << endl << <span style="color: #e60000;">"Enter the number of the OpenCL platform you want to use: "</span>;
<span style="color: #2b91af;">unsigned</span> <span style="color: #2b91af;">int</span> input = 0;
cin >> input;
<span style="color: #00cc00;">// Handle incorrect user input</span>
<span style="color: blue;">while</span> (input < 1 || input > platforms.size()){
cin.clear(); <span style="color: #00cc00;">//clear errors/bad flags on cin</span>
cin.ignore(cin.rdbuf()->in_avail(), '\n'); <span style="color: #00cc00;">// ignores exact number of chars in cin buffer</span>
cout << <span style="color: #e60000;">"No such platform."</span> << endl << <span style="color: #e60000;">"Enter the number of the OpenCL platform you want to use: "</span>;
cin >> input;
}
Platform platform = platforms[input - 1];
<span style="color: #00cc00;">// Print the name of chosen OpenCL platform</span>
cout << <span style="color: #e60000;">"Using OpenCL platform: \t"</span> << platform.getInfo<CL_PLATFORM_NAME>() << endl;
<span style="color: #00cc00;">// Find all available OpenCL devices (e.g. CPU, GPU or integrated GPU)</span>
vector<Device> devices;
platform.getDevices(CL_DEVICE_TYPE_ALL, &devices);
<span style="color: #00cc00;">// Print the names of all available OpenCL devices on the chosen platform</span>
cout << <span style="color: #e60000;">"Available OpenCL devices on this platform: "</span> << endl << endl;
<span style="color: blue;">for</span> (<span style="color: #2b91af;">unsigned</span> <span style="color: #2b91af;">int</span> i = 0; i < devices.size(); i++)
cout << <span style="color: #e60000;">"\t"</span> << i + 1 << <span style="color: #e60000;">": "</span> << devices[i].getInfo<CL_DEVICE_NAME>() << endl;
<span style="color: #00cc00;">// Choose an OpenCL device </span>
cout << endl << <span style="color: #e60000;">"Enter the number of the OpenCL device you want to use: "</span>;
input = 0;
cin >> input;
<span style="color: #00cc00;">// Handle incorrect user input</span>
<span style="color: blue;">while</span> (input < 1 || input > devices.size()){
cin.clear(); <span style="color: #00cc00;">//clear errors/bad flags on cin</span>
cin.ignore(cin.rdbuf()->in_avail(), '\n'); <span style="color: #00cc00;">// ignores exact number of chars in cin buffer</span>
cout << <span style="color: #e60000;">"No such device. Enter the number of the OpenCL device you want to use: "</span>;
cin >> input;
}
Device device = devices[input - 1];
<span style="color: #00cc00;">// Print the name of the chosen OpenCL device</span>
cout << endl << <span style="color: #e60000;">"Using OpenCL device: \t"</span> << device.getInfo<CL_DEVICE_NAME>() << endl << endl;
<span style="color: #00cc00;">// Create an OpenCL context on that device.</span>
<span style="color: #00cc00;">// the context manages all the OpenCL resources </span>
Context context = Context(device);
<span style="color: #00cc00;">///////////////////</span>
<span style="color: #00cc00;">// OPENCL KERNEL //</span>
<span style="color: #00cc00;">///////////////////</span>
<span style="color: #00cc00;">// the OpenCL kernel in this tutorial is a simple program that adds two float arrays in parallel </span>
<span style="color: #00cc00;">// the source code of the OpenCL kernel is passed as a string to the host</span>
<span style="color: #00cc00;">// the "__global" keyword denotes that "global" device memory is used, which can be read and written </span>
<span style="color: #00cc00;">// to by all work items (threads) and all work groups on the device and can also be read/written by the host (CPU)</span>
<span style="color: blue;">const</span> <span style="color: #2b91af;">char</span>* source_string =
<span style="color: #e60000;">" __kernel void parallel_add(__global float* x, __global float* y, __global float* z){ "</span>
<span style="color: #e60000;">" const int i = get_global_id(0); "</span> <span style="color: #00cc00;">// get a unique number identifying the work item in the global pool</span>
<span style="color: #e60000;">" z[i] = y[i] + x[i]; "</span> <span style="color: #00cc00;">// add two arrays </span>
<span style="color: #e60000;">"}"</span>;
<span style="color: #00cc00;">// Create an OpenCL program by performing runtime source compilation</span>
Program program = Program(context, source_string);
<span style="color: #00cc00;">// Build the program and check for compilation errors </span>
cl_int result = program.build({ device }, <span style="color: #e60000;">""</span>);
<span style="color: blue;">if</span> (result) cout << <span style="color: #e60000;">"Error during compilation! ("</span> << result << <span style="color: #e60000;">")"</span> << endl;
<span style="color: #00cc00;">// Create a kernel (entry point in the OpenCL source program)</span>
<span style="color: #00cc00;">// kernels are the basic units of executable code that run on the OpenCL device</span>
<span style="color: #00cc00;">// the kernel forms the starting point into the OpenCL program, analogous to main() in CPU code</span>
<span style="color: #00cc00;">// kernels can be called from the host (CPU)</span>
Kernel kernel = Kernel(program, <span style="color: #e60000;">"parallel_add"</span>);
<span style="color: #00cc00;">// Create input data arrays on the host (= CPU)</span>
<span style="color: blue;">const</span> <span style="color: #2b91af;">int</span> numElements = 10;
<span style="color: #2b91af;">float</span> cpuArrayA[numElements] = { 0.0f, 1.0f, 2.0f, 3.0f, 4.0f, 5.0f, 6.0f, 7.0f, 8.0f, 9.0f };
<span style="color: #2b91af;">float</span> cpuArrayB[numElements] = { 0.1f, 0.2f, 0.3f, 0.4f, 0.5f, 0.6f, 0.7f, 0.8f, 0.9f, 1.0f };
<span style="color: #2b91af;">float</span> cpuOutput[numElements] = {}; <span style="color: #00cc00;">// empty array for storing the results of the OpenCL program</span>
<span style="color: #00cc00;">// Create buffers (memory objects) on the OpenCL device, allocate memory and copy input data to device.</span>
<span style="color: #00cc00;">// Flags indicate how the buffer should be used e.g. read-only, write-only, read-write</span>
Buffer clBufferA = Buffer(context, CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR, numElements * <span style="color: blue;">sizeof</span>(cl_int), cpuArrayA);
Buffer clBufferB = Buffer(context, CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR, numElements * <span style="color: blue;">sizeof</span>(cl_int), cpuArrayB);
Buffer clOutput = Buffer(context, CL_MEM_WRITE_ONLY, numElements * <span style="color: blue;">sizeof</span>(cl_int), NULL);
<span style="color: #00cc00;">// Specify the arguments for the OpenCL kernel</span>
<span style="color: #00cc00;">// (the arguments are __global float* x, __global float* y and __global float* z)</span>
kernel.setArg(0, clBufferA); <span style="color: #00cc00;">// first argument </span>
kernel.setArg(1, clBufferB); <span style="color: #00cc00;">// second argument </span>
kernel.setArg(2, clOutput); <span style="color: #00cc00;">// third argument </span>
<span style="color: #00cc00;">// Create a command queue for the OpenCL device</span>
<span style="color: #00cc00;">// the command queue allows kernel execution commands to be sent to the device</span>
CommandQueue queue = CommandQueue(context, device);
<span style="color: #00cc00;">// Determine the global and local number of "work items"</span>
<span style="color: #00cc00;">// The global work size is the total number of work items (threads) that execute in parallel</span>
<span style="color: #00cc00;">// Work items executing together on the same compute unit are grouped into "work groups"</span>
<span style="color: #00cc00;">// The local work size defines the number of work items in each work group</span>
<span style="color: #00cc00;">// Important: global_work_size must be an integer multiple of local_work_size </span>
std::<span style="color: #2b91af;">size_t</span> global_work_size = numElements;
std::<span style="color: #2b91af;">size_t</span> local_work_size = 10; <span style="color: #00cc00;">// could also be 1, 2 or 5 in this example</span>
<span style="color: #00cc00;">// when local_work_size equals 10, all ten number pairs from both arrays will be added together in one go</span>
<span style="color: #00cc00;">// Launch the kernel and specify the global and local number of work items (threads)</span>
queue.enqueueNDRangeKernel(kernel, NULL, global_work_size, local_work_size);
<span style="color: #00cc00;">// Read and copy OpenCL output to CPU </span>
<span style="color: #00cc00;">// the "CL_TRUE" flag blocks the read operation until all work items have finished their computation</span>
queue.enqueueReadBuffer(clOutput, CL_TRUE, 0, numElements * <span style="color: blue;">sizeof</span>(cl_float), cpuOutput);
<span style="color: #00cc00;">// Print results to console</span>
<span style="color: blue;">for</span> (<span style="color: #2b91af;">int</span> i = 0; i < numElements; i++)
cout << cpuArrayA[i] << <span style="color: #e60000;">" + "</span> << cpuArrayB[i] << <span style="color: #e60000;">" = "</span> << cpuOutput[i] << endl;
system(<span style="color: #e60000;">"PAUSE"</span>);
}
</pre>
</td></tr>
</tbody></table>
</div>
</pre>
</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
This code is also available at<br />
<a href="https://github.com/straaljager/OpenCL-path-tracing-tutorial-1-Getting-started">https://github.com/straaljager/OpenCL-path-tracing-tutorial-1-Getting-started</a><br />
<br /></div>
<div style="text-align: justify;">
<br />
<span style="font-size: large;"><b><span style="color: #6fa8dc;">Compiling instructions</span> </b><span style="font-size: small;">(for Visual Studio on Windows)</span></span></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
To compile this code, it's recommended to download and install the <b><a href="http://developer.amd.com/tools-and-sdks/opencl-zone/amd-accelerated-parallel-processing-app-sdk/">AMD App SDK</a></b> (this works for systems with GPUs or CPUs from AMD, Nvidia and Intel, even if your system doesn't have an AMD CPU or GPU installed) since Nvidia's OpenCL implementation is no longer up-to-date.</div>
<ul style="text-align: justify;">
</ul>
<ol>
<li>Start an empty Console project in Visual Studio (any recent version should work, including Express and Community) and set to <b>Release</b> mode </li>
<li>Add the SDK include path to the "<b>Additional Include Directories</b>" (e.g. "C:\Program Files (x86)\AMD APP SDK\2.9-1\include") </li>
<li>In Linker > Input, add "<b>opencl.lib</b>" to "<b>Additional Dependencies</b>" and add the OpenCL lib path to "<b>Additional Library Directories</b>" (e.g. "C:\Program Files (x86)\AMD APP SDK\2.9-1\lib\x86")</li>
<li>Add the <b>main.cpp</b> file (or create a new file and paste the code) and build the project</li>
</ol>
<ul style="text-align: justify;">
</ul>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<span style="color: #6fa8dc;"><span style="font-size: large;"><b>Download binaries</b></span></span> </div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
The executable (Windows only) for this tutorial is available at <br />
<a href="https://github.com/straaljager/OpenCL-path-tracing-tutorial-1-Getting-started/releases/tag/v1.0">https://github.com/straaljager/OpenCL-path-tracing-tutorial-1-Getting-started/releases/tag/v1.0</a><br />
<br />
It runs on CPUs and/or GPUs from AMD, Nvidia and Intel. </div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<br />
<span style="color: #6fa8dc;"><b><span style="font-size: large;">Useful References</span></b></span><br />
<br />
- "<b>A gentle introduction to OpenCL</b>":<br />
<a href="http://www.drdobbs.com/parallel/a-gentle-introduction-to-opencl/231002854">http://www.drdobbs.com/parallel/a-gentle-introduction-to-opencl/231002854 </a><br />
<br />
- "<b>Simple start with OpenCL</b>": <br />
<a href="http://simpleopencl.blogspot.co.nz/2013/06/tutorial-simple-start-with-opencl-and-c.html">http://simpleopencl.blogspot.co.nz/2013/06/tutorial-simple-start-with-opencl-and-c.html </a><br />
<br />
- Anteru's blogpost, <b>Getting started with OpenCL</b> (uses old OpenCL API)<br />
<a href="https://anteru.net/blog/2012/11/03/2009/index.html">https://anteru.net/blog/2012/11/03/2009/index.html</a><br />
<br />
- <b>AMD introduction to OpenCL programming</b>:<br />
<a href="http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2013/01/Introduction_to_OpenCL_Programming-201005.pdf">http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2013/01/Introduction_to_OpenCL_Programming-201005.pdf</a> <br />
<br />
<br />
<span style="color: #6fa8dc;"><span style="font-size: large;"><b>Up next</b></span></span><br />
<br />
In the next tutorial we'll start rendering an image with OpenCL.</div>
Sam Laperehttp://www.blogger.com/profile/05688552048697970050noreply@blogger.com4tag:blogger.com,1999:blog-7277449027963623452.post-33798598223477661372016-09-20T05:02:00.001-07:002016-12-01T03:16:54.545-08:00GPU path tracing tutorial 4: Optimised BVH building, faster traversal and intersection kernels and HDR environment lighting<div style="text-align: justify;">
For this tutorial, I've implemented a couple of improvements based on the high performance GPU ray tracing framework of Timo Aila, Samuli Laine and Tero Karras (Nvidia research) which is described in their 2009 paper <a href="https://mediatech.aalto.fi/~samuli/publications/aila2009hpg_paper.pdf">"Understanding the efficiency of ray traversal on GPUs"</a> and the <a href="https://research.nvidia.com/publication/understanding-efficiency-ray-traversal-gpus-kepler-and-fermi-addendum">2012 addendum to the original paper</a> which contains specifically hand tuned kernels for Fermi and Kepler GPUs (which also works on Maxwell). The code for this framework is open source and can be found at the <a href="https://code.google.com/archive/p/understanding-the-efficiency-of-ray-traversal-on-gpus/">Google code repository</a> (which is about to be phased out) or on <a href="https://github.com/matt77hias/GPURayTraversal/tree/master/src">GitHub</a>. The ray tracing kernels are thoroughly optimised and deliver state-of-the-art performance (the code from this tutorial is 2-3 times faster than the previous one). For that reason, they are also used in the production grade CUDA path tracer Cycles:<br />
<br />
- <a href="https://wiki.blender.org/index.php/Dev:Source/Render/Cycles/BVH">wiki.blender.org/index.php/Dev:Source/Render/Cycles/BVH</a><br />
<br />
- <a href="https://github.com/doug65536/blender/blob/master/intern/cycles/kernel/kernel_bvh.h">github.com/doug65536/blender/blob/master/intern/cycles/kernel/kernel_bvh.h</a> <br />
<br />
- <a href="https://github.com/doug65536/blender/blob/master/intern/cycles/kernel/kernel_bvh_traversal.h">github.com/doug65536/blender/blob/master/intern/cycles/kernel/kernel_bvh_traversal.h</a> </div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
The major improvements from this framework are:</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
- <span style="color: #009900;"><b>Spatial split BVH</b></span>: this BVH building method is based on Nvidia's "<a href="http://www.nvidia.com/docs/IO/77714/sbvh.pdf">Spatial splits in bounding volume hierarchies</a>" paper by Martin Stich. It aims to reduce BVH node overlap (a high amount of node overlap lowers ray tracing performance) by combining the object splitting strategy of regular BVH building (according to a surface area heuristic or SAH) with the space splitting method of kd-tree building. The algorithm determines for each triangle whether "splitting" it (by creating duplicate references to the triangle and storing them in its overlapping nodes) lowers the cost of ray/node intersections compared to the "unsplit" case. The result is a very high quality acceleration structure with ray traversal performance which on average is significantly higher than (or in the worst case equal to) a regular SAH BVH.</div>
<br />
<div style="text-align: justify;">
- <span style="color: #009900;"><b>Woop ray/triangle intersection</b></span>: this algorithm is explained in <a href="http://www.sven-woop.de/papers/2004-GH-SaarCOR.pdf">"Real-time ray tracing of dynamic scenes on an FPGA chip"</a>. It basically transforms each triangle in the mesh to a unit triangle with vertices (0, 0, 0), (1, 0, 0) and (0, 1, 0). During rendering, a ray is transformed into "unit triangle space" using a triangle specific affine triangle transformation and intersected with the unit triangle, which is a much simpler computation. </div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
- <span style="color: #009900;"><b>Hand optimised GPU ray traversal and intersection kernels</b></span>: these kernels use a number of specific tricks to minimise thread divergence within a warp (a warp is a group of 32 SIMD threads which operate in lockstep, i.e. all threads within a warp must execute the same instructions). Thread divergence occurs when one or more threads within a warp follow a different code execution branch, which (in the absolute worst case) could lead to a scenario where only one thread is active while the other 31 threads in the warp are idling, waiting for it to finish. Using "persistent threads" aims to mitigate this problem: when a predefined number of CUDA threads within a warp is idling, the GPU will dynamically fetch new work for these threads in order to increase compute occupancy. The persistent threads feature is used in the original framework. To keep things simple for this tutorial, it has not been implemented as it requires generating and buffering batches of rays, but it is relatively easy to add. Another optimisation to increase SIMD efficiency in a warp is postponing ray/triangle intersection tests until all threads in the same warp have found a leaf node. Robbin Marcus wrote <a href="http://robbinmarcus.blogspot.co.nz/2015/12/real-time-raytracing-part-31.html">a very informative blogpost</a> about these specific optimisations. In addition to these tricks, the Kepler kernel also uses the GPUs video instructions to perform min/max operations (see "renderkernel.cu" at the top).<br />
<br />
UPDATE: an attentive reader (who knows what he's talking about) corrected a mistake in the above paragraph: "Persistent threading on the GPU was designed to work around the slow dynamic
load balancing hardware of the time (GTX 260), not to address branch
divergence (totally separate issue). Occupancy is again a different issue, related to how many registers your
kernel needs versus how many are present in a SM to spawn threads for
latency hiding." </div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
Other new features:</div>
<div style="text-align: justify;">
- a basic OBJ loader which triangulates n-sided faces (n-gons, triangle fans)</div>
<div style="text-align: justify;">
- simple HDR environment map lighting, which for simplicity does not use any filtering (hence the blockiness) or importance sampling yet. The code is based on <a href="http://blog.hvidtfeldts.net/index.php/2012/10/image-based-lighting/">http://blog.hvidtfeldts.net/index.php/2012/10/image-based-lighting/</a> <br />
<br /></div>
<div style="text-align: justify;">
Some renders with the code from this tutorial (the "Roman Settlement" city scene was created by LordGood and converted from <a href="https://3dwarehouse.sketchup.com/model.html?id=2bf8769ad745b52d6eae68c8785f7931">a SketchUp model</a>, also used by Mitsuba Render. The HDR maps are available at the <a href="http://www.hdrlabs.com/sibl/archive.html">HDR Labs website</a>):</div>
<div style="text-align: justify;">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgJvcswA0RebXsb-lPp26YTu001zcGnHmgN_KFAAmjHcbme73QFUv46iAy1kruLy8MP9BGnGJ4XooUelk8XBOuNba2cPopzTb9ZGX0Jt-N4j9bgTza-tPfUPWF86iA_OiycV7lfsid-I_Q/s1600/dragonhdr3.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="233" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgJvcswA0RebXsb-lPp26YTu001zcGnHmgN_KFAAmjHcbme73QFUv46iAy1kruLy8MP9BGnGJ4XooUelk8XBOuNba2cPopzTb9ZGX0Jt-N4j9bgTza-tPfUPWF86iA_OiycV7lfsid-I_Q/s400/dragonhdr3.png" width="400" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiu_O2bqqt4gNgi6KFfNtAZqMqzI9z3YqnyBw20qcqCO_cqKFBxcph5ANN1K8fTNxvz0y2bHOAu9E7pZN5KUDK50swv5yfpxjkd-S8x2l9BiWucdJ1qresjDTDCVLcW6B28CQi6B5mQ3JQ/s1600/dragonhdr4.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="233" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiu_O2bqqt4gNgi6KFfNtAZqMqzI9z3YqnyBw20qcqCO_cqKFBxcph5ANN1K8fTNxvz0y2bHOAu9E7pZN5KUDK50swv5yfpxjkd-S8x2l9BiWucdJ1qresjDTDCVLcW6B28CQi6B5mQ3JQ/s400/dragonhdr4.png" width="400" /></a><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgHHQ1FNgMmr3UfOTnduE2aHpuj7mKleHNEHlefGI5DSUFF_NLKEV40QrkMyMblDMA1ajxjJimRLXTR_c7jJTDAeVTp4DsJundtFF3MfFxEkiLREkZloE0MlltdkUUVl3-vS4jVJLHPtWg/s1600/dragonhdrgold2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="233" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgHHQ1FNgMmr3UfOTnduE2aHpuj7mKleHNEHlefGI5DSUFF_NLKEV40QrkMyMblDMA1ajxjJimRLXTR_c7jJTDAeVTp4DsJundtFF3MfFxEkiLREkZloE0MlltdkUUVl3-vS4jVJLHPtWg/s400/dragonhdrgold2.png" width="400" /></a><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiMDg3b1rd4arXTE9RQZgltsxzNrsWdt2xSiYgN-IvCM2cHalQbKDU_iH55b5l5L0il-rAndlzJoQSYJLKC49pCiVjyQ_Djeb5myUjsY5rq6KcCEDA34HlDNt_jWsChJKWxpQO_uFt9XCE/s1600/dragonglass.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="233" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiMDg3b1rd4arXTE9RQZgltsxzNrsWdt2xSiYgN-IvCM2cHalQbKDU_iH55b5l5L0il-rAndlzJoQSYJLKC49pCiVjyQ_Djeb5myUjsY5rq6KcCEDA34HlDNt_jWsChJKWxpQO_uFt9XCE/s400/dragonglass.png" width="400" /></a><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjFzLO3QU0HmaBVA2Taiv757b834UOg6Bv7sJMv2zpx8gbenebXpzvdsXPx1AlwPWo7iMUP_bGXcB84joYtuwiEmivB38qCxG9KS-DJW0g5fpaffjxHn_mVzNBPFFlBu2gDLiY8O98Z-KY/s1600/dragonhdrgold.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="233" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjFzLO3QU0HmaBVA2Taiv757b834UOg6Bv7sJMv2zpx8gbenebXpzvdsXPx1AlwPWo7iMUP_bGXcB84joYtuwiEmivB38qCxG9KS-DJW0g5fpaffjxHn_mVzNBPFFlBu2gDLiY8O98Z-KY/s400/dragonhdrgold.png" width="400" /></a><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgIQtS-RI0VDPW9R1ajcVy7Khf9Bq3uiA6vkJr9IT7miQNNd37XMI9_i_VFXVXD4GoohNGvR27aQ85qT-vs9-d2oyPbJo2u-CclXH5bvY3uGhtUD2BRMz1pAtA8a37wVKaERWNGQx8QuC8/s1600/dragonhdricoat.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="233" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgIQtS-RI0VDPW9R1ajcVy7Khf9Bq3uiA6vkJr9IT7miQNNd37XMI9_i_VFXVXD4GoohNGvR27aQ85qT-vs9-d2oyPbJo2u-CclXH5bvY3uGhtUD2BRMz1pAtA8a37wVKaERWNGQx8QuC8/s400/dragonhdricoat.png" width="400" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjGhMQiYZe2JRIRqVkcDfOLYoejt-i555Ova2RxyrydIo-lG1bU4ctW8ZYnsMuwWcxZnMij75Wx4FVqyWGKQ7qcH6m-Zppwrip6iauMWv6UaU3EXWF7rLsWR2MOZTWkYv2NK3hXZW00BOg/s1600/romancity2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="233" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjGhMQiYZe2JRIRqVkcDfOLYoejt-i555Ova2RxyrydIo-lG1bU4ctW8ZYnsMuwWcxZnMij75Wx4FVqyWGKQ7qcH6m-Zppwrip6iauMWv6UaU3EXWF7rLsWR2MOZTWkYv2NK3hXZW00BOg/s400/romancity2.png" width="400" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjswCVXFueQlavW4tKRpivrh2nftX2-IrjlFpBrmx8tDiyFldzwI7mYlcU1CbzFnuxPZPO3CVBQessI4l2ARF0lXIjzLhMOrSazXutr3oeqAYiRyPfvW0IQSUUbEIw6FxOI5NiqtRQAs4Q/s1600/romancity3.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="233" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjswCVXFueQlavW4tKRpivrh2nftX2-IrjlFpBrmx8tDiyFldzwI7mYlcU1CbzFnuxPZPO3CVBQessI4l2ARF0lXIjzLhMOrSazXutr3oeqAYiRyPfvW0IQSUUbEIw6FxOI5NiqtRQAs4Q/s400/romancity3.png" width="400" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgGvcXpjEfCcWdtCI16LWycRKfS_eJrNmNT-Rfp_wqpAsLwuPMvrt5H1Szl6fiIZwvDojWEWdmpziGUl-0RK3KelPL810Mn3pMx7HEIarteFZrRB5BkV5ZRgg-tMH6IFW1fQBL1lxBQRe8/s1600/romancity6.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="233" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgGvcXpjEfCcWdtCI16LWycRKfS_eJrNmNT-Rfp_wqpAsLwuPMvrt5H1Szl6fiIZwvDojWEWdmpziGUl-0RK3KelPL810Mn3pMx7HEIarteFZrRB5BkV5ZRgg-tMH6IFW1fQBL1lxBQRe8/s400/romancity6.png" width="400" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgqW6UNS8BYHdkyxgD1fawHiKitE_ZutGWnc356-E_19_w3kN_qpqY9bcbeKWO2p2it8ZEeAS8glvTRHpnn_EFNkAarVqQJ_GnkZBwoMh4AvFapwoMVyiB5oQo7MYxWNEb1poAnhy8d7Hs/s1600/romancity5.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="233" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgqW6UNS8BYHdkyxgD1fawHiKitE_ZutGWnc356-E_19_w3kN_qpqY9bcbeKWO2p2it8ZEeAS8glvTRHpnn_EFNkAarVqQJ_GnkZBwoMh4AvFapwoMVyiB5oQo7MYxWNEb1poAnhy8d7Hs/s400/romancity5.png" width="400" /></a></div>
<h3 style="text-align: justify;">
</h3>
<div style="text-align: justify;">
<span style="font-size: large;"><b>Source code</b></span></div>
<div style="text-align: justify;">
<span style="font-size: large;"><b> </b></span> </div>
<div style="text-align: justify;">
The tutorial's source code can be found at<a href="https://github.com/straaljager/GPU-path-tracing-tutorial-4"> github.com/straaljager/GPU-path-tracing-tutorial-4</a></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
For clarity, I've tried to simplify the
code where possible, keeping the essential improvements provided by the
framework and cutting out the unnecessary parts. I have also added clarifying comments to the most difficult code parts where appropriate. There is quite a lot of new code, but the most important and interesting files are:<br />
<br />
<b>- <span style="color: #cccccc;">SplitBVHBuilder.cpp</span></b> contains the algorithm for building BVH with spatial splits <br />
<b>- <span style="color: #cccccc;">CudaBVH.cpp</span> </b>shows the particular layout in which the BVH nodes are stored and Woop's triangle transformation method<br />
<b>- <span style="color: #cccccc;">renderkernel.cu</span> </b>demonstrates two methods of ray/triangle intersection: a regular ray/triangle intersection algorithm similar to the one in <a href="http://raytracey.blogspot.co.nz/2016/01/gpu-path-tracing-tutorial-3-take-your.html">GPU path tracing tutorial 3</a>, denoted as DEBUGintersectBVHandTriangles() and a method using Woop's ray/triangle intersection named intersectBVHandTriangles()
<span style="font-size: large;"><b></b></span><br />
<span style="font-size: large;"><b><br /></b></span></div>
<div style="text-align: justify;">
<span style="font-size: large;"><b>Demo</b></span><span style="font-size: large;"><b> </b></span></div>
<div style="text-align: justify;">
<span style="font-size: large;"><b> </b></span> </div>
<div style="text-align: justify;">
A downloadable demo (which requires an Nvidia GPU) is available from <br />
<a href="https://github.com/straaljager/GPU-path-tracing-tutorial-4/releases/">github.com/straaljager/GPU-path-tracing-tutorial-4/releases</a><br />
<br />
</div>
<div style="text-align: justify;">
Working with and learning this ray tracing framework was a lot of fun, head scratching and cursing (mostly the latter). It has given me a deeper appreciation for both the intricacies and strengths of GPUs and taught me a multitude of ways of how to optimise Cuda code to maximise performance (even to the level of assembly/PTX). I recommend anyone who wants to build a GPU renderer to sink their teeth in it (the source code in this tutorial should make it easier to digest the complexities). It keeps astounding me what GPUs are capable of and how much they have evolved in the last decade. </div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
The next tutorial(s) will cover direct lighting, physical sky, area lights, textures and instancing. I've also had a few requests from people who are new to ray tracing for a more thorough explanation of the code from previous tutorials. At some point (when time permits), I hope to create tutorials with illustrations and pseudocode of all the concepts covered.</div>
Sam Laperehttp://www.blogger.com/profile/05688552048697970050noreply@blogger.com15tag:blogger.com,1999:blog-7277449027963623452.post-52173139790507671862016-06-09T15:13:00.002-07:002016-06-10T20:57:31.127-07:00Real-time path traced Quake 2<div style="text-align: justify;">
Last week, Edd Biddulph released the code and some videos of a very impressive project he's working on: a real-time path traced
version of Quake 2 running on OpenGL 3.3.<br />
<br />
Project link with videos: <a href="http://amietia.com/q2pt.html" target="_blank">http://amietia.com/q2pt.html</a> <br />
Full source code on Github: <a href="https://github.com/eddbiddulph/yquake2/tree/pathtracing" target="_blank" title="Ctrl+Klik of tik om de koppeling te volgen">https://github.com/eddbiddulph/yquake2/tree/pathtracing</a></div>
<div style="text-align: justify;">
<br /></div>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiOzVIG8_iwaipLExw_5_ZT1H3zGeg6JY_Ti59sXIpjaeLJK3nmmcQo5_Dt_JUK9sezJ3GjBVNe61vbNGqC4N7f5gUPBQa4sT3gKTz4NG8XmnZMDSckdRvxGAAycU70Coo2THENmw0X5kk/s1600/q2pt1.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="300" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiOzVIG8_iwaipLExw_5_ZT1H3zGeg6JY_Ti59sXIpjaeLJK3nmmcQo5_Dt_JUK9sezJ3GjBVNe61vbNGqC4N7f5gUPBQa4sT3gKTz4NG8XmnZMDSckdRvxGAAycU70Coo2THENmw0X5kk/s400/q2pt1.png" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Quake 2, now with real-time indirect lighting and soft shadows</td><td class="tr-caption" style="text-align: center;"><br /></td></tr>
</tbody></table>
<div style="text-align: justify;">
The path tracing engine behind this project is quite astonishing when you consider the number of lightsources in the level and
the amount of dynamic
characters (each with a unique pose) that are updated every single frame. I had a very interesting talk with Edd on some of the features of his engine, revealing that he used a lot of clever optimisations (some of which are taking advantage of the specific properties of the Quake 2 engine). </div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
Copying Edd's answers here:</div>
<div style="text-align: justify;">
<blockquote class="tr_bq">
<b><i>Why Quake 2 instead of Quake 3</i></b></blockquote>
<blockquote class="tr_bq">
I chose Quake 2 because it has area
lightsources and the maps were designed with multiple-bounce lighting in
mind. As far as I know, Quake 3 was not designed this way and didn't
even have area lightsources for the baked lighting. Plus Quake 2's
static geometry was still almost entirely defined by a binary space partitioning tree (BSP) and I
found that traversing a BSP is pretty easy in GLSL and seems to perform
quite well, although I haven't made any comparisons to other approaches.
Quake 3 has a lot more freeform geometry such as tessellated Bezier
surfaces so it doesn't lend itself so well to special optimisations. I'm
a big fan of both games of course :)<br />
<br />
<b><i>How the engine updates dynamic objects</i></b> </blockquote>
<blockquote class="tr_bq">
All dynamic geometry is inserted
into a single structure which is re-built from scratch on every frame.
Each node is an axis-aligned bounding box and has a 'skip pointer' to
skip over the children. I make a node for each triangle and build the
structure bottom-up after sorting the leaf nodes by morton code for
spatial coherence. I chose this approach because the implementation is
simple both for building and traversing, the node hierarchy is quite
flexible, and building is fast although the whole CPU side is
single-threaded for now (mostly because Quake 2 is single-threaded of
course). I'm aware that the lack of ordered traversal results in many
more ray-triangle intersection tests than are necessary, but there is
little divergence and low register usage since the traversal is
stackless.<br />
<br />
<b><i>How to keep noise to a minimum when dealing with so many lights</i></b> </blockquote>
<blockquote class="tr_bq">
The light
selection is a bit more tricky. I divided lightsources into two
categories - regular and 'skyportals'. A skyportal is just a
light-emitting surface from the original map data which has a special
texture applied, which indicates to the game that the skybox should be
drawn there. Each leaf in the BSP has two lists of references to
lightsources. The first list references regular lightsources which are
potentially visible from the leaf according to the PVS (potentially visible set) tables. The
second list references skyportals which are contained within the leaf.
At an intersection point the first list is used to trace shadow rays and
make explicit samples of lightsources, and the second list is used to
check if the intersection point is within a skyportal surface. If it's
within a skyportal then there is a contribution of light from the sky.
This way I can perform a kind of offline multiple importance sampling (MIS) because skyportals are
generally much larger than regular lights. For regular lights of course I
use importance sampling, but I believe the weight I use is more
approximate than usual because it's calculated always from the center of
the lightsource rather than from the real sample position on the light.<br />
<br />
One big point about the lights right now is that the pointlights that
the original game used are being added as 4 triangular lightsources
arranged in a tetrahedron so they tend to make quite a performance hit.
I'd like to try adding a whole new type of lightsource such as a
spherical light to see if that works out better.<br />
<br />
<b><i>Ray tracing specific optimisations</i></b> </blockquote>
<blockquote class="tr_bq">
I'm making explicit light
samples by tracing shadow rays directly towards points on the
lightsources. MIS isn't being
performed in the shader, but I'm deciding offline whether
a lightsource should be sampled explicitly or implicitly.<br />
<br />
<b><i>Which parts of the rendering process use rasterisation</i></b> </blockquote>
<blockquote class="tr_bq">
I use hardware rasterisation only
for the primary rays and perform the raytracing in the same pass for the
following reasons:<br />
<ul>
<li>Translucent surfaces can be lit and can receive shadows identically to all other surfaces.</li>
<li>Hardware anti-aliasing can be used, of course.</li>
<li>Quake 2 sorts translucent BSP surfaces and draws them in a second
pass, but it doesn't do this for entities (the animated objects) so I
would need to change that design and I consider this too intrusive and
likely to break something. One of my main goals was to preserve the
behaviour of Q2's own renderer.</li>
<li>I'm able to eliminate overdraw by making a depth-only pre-pass which
even uses the same GL buffers that the raytracer uses so it has little
overhead except for a trick that I had to make since I packed the three
16-bit triangle indices for the raytracer into two 32-bit elements (this
was necessary due to OpenGL limitations on texture buffer objects).</li>
<li>It's nice that I don't need to manage framebuffer stuff and design a good g-buffer format. </li>
</ul>
</blockquote>
<blockquote class="tr_bq">
<b><i>The important project files containing the path tracing code</i></b> </blockquote>
<blockquote class="tr_bq">
If you want to take a look
at the main parts that I wrote, stick to
src/client/refresh/r_pathtracing.c and
src/client/refresh/pathtracer.glsl. The rest of my changes were mostly
about adding various GL extensions and hooking in my stuff to the old
refresh subsystem (Quake 2's name for the renderer). I apologise that
r_pathtracing.c is such a huge file, but I did try to comment it nicely
and refactoring is already on my huge TODO list. The GLSL file is
converted into a C header at build time by stringifyshaders.sh which is
at the root of the codebase.<br />
<br />
<b><i>More interesting tidbits</i></b> </blockquote>
<blockquote class="tr_bq">
- This whole project is only made practical by the fact that the BSP
files still contain surface emission data despite the game itself making
no use of it at all. This is clearly a by-product of keeping the
map-building process simple, and it's a very fortunate one!</blockquote>
<blockquote class="tr_bq">
- The designers of the original maps sometimes placed pointlights in
front of surface lights to give the appearence that they are glowing or
emitting light at their sides like a fluorescent tube diffuser. This
looks totally weird in my pathtracer so I have static pointlights
disabled by default. They also happen to go unused by the original game,
so it's also fortunate that they still exist among the map data. </blockquote>
<blockquote class="tr_bq">
- The weapon that is viewed in first-person is drawn with a 'depth hack'
(it's literally called RF_DEPTHHACK), in which the range of depth
values is reduced to prevent the weapon poking in to walls.
Unfortunately the pathtracer's representation would still poke in to
walls because it needs the triangles in worldspace, and this would cause
the tip of the weapon to turn black (completely in shadow). I worked
around this by 'virtually' scaling down the weapon for the pathtracer.
This is one of the many ways in which raytracing turns out to be tricky
for videogames, but I'm sure there can always be elegant solutions.</blockquote>
</div>
<div style="text-align: justify;">
If you want to mess around with the path traced version of Quake 2 yourself (both AMD and Nvidia cards are supported as the path tracer uses OpenGL), simply follow these steps:<br />
<ul>
<li>on Windows, follow the steps under section 2.3 in the readme file (link: https://github.com/eddbiddulph/yquake2/blob/pathtracing/README). Lots of websites still offer the Quake 2 demo for download (e.g. http://www.ausgamers.com/files/download/314/quake-2-demo)</li>
<li>download and unzip the Yamagi Quake 2 source code with path tracing from https://github.com/eddbiddulph/yquake2</li>
<li>following the steps under section 2.6 of the readme file, download and extract the premade MinGW build environment, run MSYS32, navigate to the source directory with the makefile, "make" the release build and replace the files "q2ded.exe", "quake2.exe" and "baseq2\game.dll" in the Quake 2 game installation with the freshly built ones</li>
<li>start the game by double clicking "quake2", open the Quake2 console with the ~ key (under the ESC key), type "gl_pt_enable 1", hit Enter and the ~ key to close the console</li>
<li>the game should now run with path tracing</li>
</ul>
<br />
Edd also said he's also planning to add new special path tracing effects
(such as light emitting particles from the railgun) and implementing
more optimisations to reduce the path tracing noise. </div>
Sam Laperehttp://www.blogger.com/profile/05688552048697970050noreply@blogger.com6tag:blogger.com,1999:blog-7277449027963623452.post-40201509949791259162016-05-17T17:57:00.002-07:002016-05-17T19:55:33.086-07:00Start your engines: source code for FireRays (AMD's high performance OpenCL based GPU ray tracing framework) available<div style="text-align: justify;">
AMD has just released<a href="http://gpuopen.com/firerays-2-0-open-sourcing-and-customizing-ray-tracing/"> the full source code of FireRays</a>, their OpenCL based GPU renderer which was first available as a SDK library since August 2015 (see http://raytracey.blogspot.co.nz/2015/08/firerays-amds-opencl-based-high.html). This is an outstanding move by AMD which significantly lowers the threshold for developers to enter the GPU rendering arena and create an efficient OpenCL based path tracing engine that is able to run on hardware from AMD, Intel and Nvidia without extra effort. </div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
Here's an ugly sample render of FireRays provided by AMD:</div>
<div style="text-align: justify;">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhtmwGzzu6SqmF-r8_FaQf1VxdfJYOsPwaJXPZ20Vo2UScX7HzpXZVPgJ1MnkP_o38H0bisZWXH8ZLnea5SYAjSeQEjO_mhf0V98UHblGX0M29C2zbct7L44CXH6NmA7joIxuqHN4Q67Jw/s1600/firerays.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="230" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhtmwGzzu6SqmF-r8_FaQf1VxdfJYOsPwaJXPZ20Vo2UScX7HzpXZVPgJ1MnkP_o38H0bisZWXH8ZLnea5SYAjSeQEjO_mhf0V98UHblGX0M29C2zbct7L44CXH6NmA7joIxuqHN4Q67Jw/s400/firerays.png" width="400" /></a></div>
<div style="text-align: justify;">
</div>
<div style="text-align: justify;">
<br />
And an old video from one of the developers:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<iframe allowfullscreen="" class="YOUTUBE-iframe-video" data-thumbnail-src="https://i.ytimg.com/vi/XbwibIbIBkE/0.jpg" frameborder="0" height="400" src="https://www.youtube.com/embed/XbwibIbIBkE?feature=player_embedded" width="480"></iframe></div>
<br />
Nvidia open sourced their <a href="https://code.google.com/archive/p/understanding-the-efficiency-of-ray-traversal-on-gpus/">high performance CUDA based ray tracing framework</a> in 2009, but hasn't updated it since 2012 (presumably due to the lack of any real competition from AMD in this area) and has since focused more on developing OptiX, a CUDA based closed source ray tracing library. Intel open sourced <a href="https://embree.github.io/">Embree</a> in 2011, which is being actively developed and updated with new features and performance improvements. They even released another open source high performance ray tracer for scientific visualisation called <a href="http://www.ospray.org/">OSPRay</a>.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
FireRays seems to have some advanced features such as ray filtering, geometry and ray masking (to make certain objects invisible to the camera or selectively ignore effects like shadows and reflections) and support for volumetrics. Hopefully AMD will also release some in-depth documentation and getting started tutorials in order to maximise adoption of this new technology among developers who are new to GPU ray tracing. </div>
Sam Laperehttp://www.blogger.com/profile/05688552048697970050noreply@blogger.com7tag:blogger.com,1999:blog-7277449027963623452.post-57793044538690627722016-03-08T07:11:00.002-08:002016-03-08T08:06:53.027-08:00Free introductory minibook on ray tracing<div style="text-align: justify;">
Peter Shirley has just released "<a href="http://www.amazon.com/gp/product/B01CO7PQ8C/">Ray tracing, the next week</a>", a free book on Amazon for anyone who wants to learn how to code a basic path tracer in C: <a href="http://psgraphics.blogspot.be/2016/03/new-ray-tracing-mini-book-is-out.html">http://psgraphics.blogspot.be/2016/03/new-ray-tracing-mini-book-is-out.html</a><br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgw8F_3pv5YzxcO0omVcrZrFOvwB3AJXWViqK1n1jtLsZnCYHM0FNJlND9YhDcaVXwuxJwqjw63I1WquOeZtT7qThVnYlb-hQyYqV1ISr2gQsbGIiS7cE7UcoeyTZb_3XRauTQU3R8obQU/s1600/raytracingnextweek.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="400" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgw8F_3pv5YzxcO0omVcrZrFOvwB3AJXWViqK1n1jtLsZnCYHM0FNJlND9YhDcaVXwuxJwqjw63I1WquOeZtT7qThVnYlb-hQyYqV1ISr2gQsbGIiS7cE7UcoeyTZb_3XRauTQU3R8obQU/s400/raytracingnextweek.jpg" width="250" /></a></div>
<br /></div>
<div style="text-align: justify;">
"Ray tracing the next week" is a follow-up to another mini-book by Shirley, "<a href="http://in1weekend.blogspot.be/2016/01/ray-tracing-in-one-weekend.html">Ray tracing in one weekend</a>" which was released only last month and covers the very basics of a ray tracer including ray-sphere intersection, path tracing of diffuse, metal and dielectric materials, anti-aliasing, positionable camera and depth-of-field. The Kindle edition is available for free when downloaded within the next five days (until 11 March). The book is excellent for people who quickly want to dive into coding a path tracer from scratch without being overwhelmed by theoretical details. It covers more advanced features such as solid textures, image textures, participating media, motion blur, instancing, and BVH acceleration structures and comes with source code snippets (using C plus classes and operator overloading, easily portable to CUDA). The code even contains some simple but clever optimisation tricks which are not published in any other ray tracing books.</div>
Sam Laperehttp://www.blogger.com/profile/05688552048697970050noreply@blogger.com4