Saturday, March 27, 2010

Fermi: a raytracing BEAST!!! (and FYC: Fuck You Charlie :-)

Fermi launched yesterday and it redefined the meaning of realtime raytracing/pathtracing. Everyone knew it was going to excell at CUDA computing tasks such as raytracing, but this is really nuts: Anandtech benched Nvidia's latest Optix demo called Design Garage which features sports cars rendered with path tracing, and the benchmark results are totally off the charts: Anand measured an 870% improvement in pathtracing performance between the GTX480 and the GTX285. Damn...I bet it flies in voxel raycasting too.

This is excellent news for all the GPU based renderers.

"Design Garage" is a really cool app and remarkably complete for a tech demo. It's probably the coolest tech demo that I've seen since the ATI Ruby demo for the HD4800 series (only this time it's playable ;-). Congratz to Nvidia and the Optix team for innovating and pushing GPU technology to the limit (despite the problems)! And fuck you Charlie (ah it feels so good :-)! Now if someone could just make a simple low res interactively pathtraced game with Optix...

Thursday, March 11, 2010

SVO and path tracing update

A few months ago, I started thinking about an implementation of extending Sparse Voxel Octree raycasting with realtime GPU accelerated path tracing as a proof of concept. Well, no need for that anymore as the guys behind Voxelstein 3D have beaten me to it.

Hans Krieg, the (only?) programmer of the Voxelstein 3d engine incorporated a path tracing extension to the engine which uses SVO and BIH (bounding interval hierarchy) to render the voxels. The code runs on both CPU and GPU(CUDA) and the first images are rather crude, but have a warm and natural global illumination to them.

For pictures and details about the engine, visit

UPDATE: Samuli Laine has written an excellent paper entitled "Efficient Sparse Voxel Octrees" which uses a nifty trick to make the contours of the voxels much less blocky. On top of that, he posted a video and the complete open sourced CUDA code of his technique as well. All can be found, read, watched on . Fantastic work and probably the most practical and "efficient" implementation of SVOs yet, ready to be abused by graphics engineers :-)

OTOY this summer?

Excellent must read article about OTOY:

Apparently OTOY is building multiple supercomputers (Fusion Render Clouds) with AMD and Supermicro which are expected to launch this summer. Wooooot!!

The hardware itself is quite intimidating. A supercomputer will consist of 128 servers, with a total of 250 AMD “Mangy Cours” Opteron microprocessors and 500 graphics chips based on AMD’s Cypress designs. Each of those graphics chips can process 2.7 teraflops, or 2.7 trillion math operations per second. Each supercomputer could serve 3,000 high-definition users, or 12,000 standard-definition users. Otoy’s own software on a consumer’s own machine is tiny, taking up just four kilobytes of data.

Check out their revamped website as well: Finally some action :-D !

This press release summarizes it all:

I'm just hoping that the awesome voxel raycasting technology first seen in the Cinema 2.0/Ruby demo for the HD4800 series in June 2008 (sheesh, that was 21 months ago!!!!), will be used as well on these cloud supercomputers, as it would be an ideal fit and instant justification for the concept of cloud rendering.

In other news, Onlive is also launching its cloud gaming service on June 17th! Are we going to have a cloudy summer?

Tuesday, March 2, 2010

OTOY teams up with SolidWorks

Nice bit of information about OTOY (finally). Thanks to the blog!

Otoy's cloud technology

Dassault Systems SolidWorks frustrated the media at SolidWorks World 2010 by being vague on the technical details of their cloud-based CAD, despite it apparently being under development for three years. So I was happy to speak with Jules Urbach of OTOY, the ceo of the company behind the curtain.

It was OTOY technology that powered the SolidWorks-in-the-cloud demo. OTOY uses a different approach from than of other technology providers, such as VisualTao (renamed PlanPlatform, renamed Autodesk Israel, recently acquired by Autodesk for its Project Butterfly for online co-editing of AutoCAD files).

The primary problem with running software on the cloud is latency -- the delay between the distant server and your computer. Latency is a function of bandwidth (how fast is the Internet connection?), distance (how far is the server from your computer?), resolution-quality of the screen images (how much data needs to be sent to your computer?), and the processing speed on the server (how quickly the data can be generated?).

Mr Urbach has been working on this problem for a decade, originally developing just such a system for playing video games over the Internet on behalf of entertainment companies like Nickelodeon. For the last couple of years, though, he's been working with AMD to deliver very high resolution images very quickly over even relatively slow connections -- which solves most of the problems associated with latency .

How OTOY Technology Works

The solutions are to (a) greatly compress images and (b) generate images at very high speeds on very low cost "computers." Compression is merely a software problem; the high-speed-but-cheap computing is made possible by AMD's new RV770 GPU with its 800 stream processing units and 2GB of 256-bit RAM boasting a bandwidth of 115GB/sec. "We're talking pennies per vector core doing parallel processing," Mr. Urbach told me.

The software-hardware combination can deliver real-time encoding of up to 3840x2160 resolution. For the more typical 1080p display, OTOY generates a frame every millisecond -- that's 1,000 frames per second. Indeed, he envisions running BluRay video from the cloud on iPhones and other devices, complete with all BluRay menuing systems.

The one problem he cannot completely solve is distance; it is desirable to keep latency under 16msec, for which the maximum distance should be about a thousand miles. But even with a server in San Francisco and the client in New York (3,000 miles), the delay is just 85msec; to Japan, about 100msec. He gets excited about the possibilities of applying his technology to ultrahigh bandwidth countries like Korea. (Companies like Akamai specialize in hosting replicated data in centers distributed around the world to cut latency for clients like CNN.)

more here: