Ray Tracer samples in the June 2008 CTP

The June 2008 Community Technology Preview (CTP) of Parallel Extensions to the .NET Framework was released on Monday, and we’re really pleased at the level of excitement in the community that we’re seeing in response. As part of the CTP, we included a variety of demos and samples to help provide a tour of the functionality. If you haven’t already, please read the blog entries: Released! Parallel Extensions to the .NET Framework June 2008 CTP, Known Issues in the June 2008 CTP of Parallel Extensions, and What’s New in the June 2008 CTP of Parallel Extensions

In this post, I’m going to explore the four separate ray tracer implementations in the samples that demonstrate how to utilize Parallel Extensions to achieve concurrent rendering (note that the actual ray tracing engine itself is not an optimized implementation, as it is intended more for discussion than performance). There were two forms of the base sequential ray tracer, both developed by Luke Hoban, one written in C# using standard loops and other imperative programming constructs, and another implemented in C# via some huge LINQ queries that define the entire execution (read about this implementation at https://blogs.msdn.com/lukeh/archive/2007/04/03/a-ray-tracer-in-c-3-0.aspx)

The samples in the CTP extend these examples in the following ways:

1. The original C# ray tracer is converted to use Parallel Extensions (both the Task Parallel Library and some of the new Coordination Data Structures) to achieve concurrency. This enhanced version was then ported to Visual Basic and F#, demonstrating that Parallel Extensions is usable from any .NET language. See …\Samples\RayTracer\…

2. The LINQ-style ray tracer is converted to use PLINQ to achieve concurrency. See …\Samples\LINQRayTracer

These ray tracer samples involve a fair number of concurrency primitives provided by Parallel Extensions:

System.Threading.Parallel

System.Threading.Task

System.Threading.TaskManager

System.Threading.TaskManagerPolicy

System.Threading.LazyInit<T>

System.Threading.Collections.IConcurrentCollection<T>

System.Threading.Collections.ConcurrentQueue<T>

System.Linq.ParallelQuery

These samples also involve some standard .NET concurrency primitives and patterns:

System.Threading.Interlocked

System.Windows.Forms.Control.BeginInvoke

System.Threading.Monitor (via the lock {…} construct)

Let’s start by exploring the C# ray tracer. Load up Samples\RayTracer\C#\RayTracer.sln, compile and run. If you receive errors regarding System.Threading, please ensure that you have the .NET Framework 3.5 and the Parallel Extensions June 2008 CTP installed. Once the app is running, press the Start button and the ray tracer animation will run in single-threaded mode:

Note that the frames-per-second is displayed in the form’s title bar, and that there are controls to Start/Stop the animation and to turn on parallel rendering. The “Parallel” checkbox enables switching back and forth between sequential and parallel rendering, while the “Show Threads” option (only available when using parallel rendering) performs color-adjustment on the rendered pixels to indicate which thread each pixel was rendered on. Let’s turn on Parallel computation with “Show Threads” and observe the effect, shown here on a quad-core PC:

In this screenshot, we can clearly see four separate colorings, which represent the four underlying threads that are calculating pixel values for the rendered image. Notice also that the allocation of pixels to threads is row-based and that the partitioning of work is not uniform based on row-counts. This is due to Parallel.For()dynamically assigning work to the worker threads to ensure that the computational load remains. The application code itself specifies that the work should be split up row-by-row as this is a good balance between fine-grained work decomposition and the overheads inherent in spooling out small work items. That is, if the work was decomposed to per-pixel work-items, the extra overheads would negatively impact the performance. As it is, parallelizing the row-by-row decomposition provides a 3x improvement when the colorization is turned off. This is a good result, as there is a non-trivial amount of work to be done that cannot be parallelized – note that even though the timing is around the image calculation only, drawing the image to the screen and refreshing the form eats into the cycles available and affects the speed-up factor. If drawing the image is omitted, the speed up is between 3.5x and 4x as expected and hoped for. The moral here is that drawing more images to the screen increases the overheads of the parallel renderer and that we should always keep Amdahl’s Law (http://en.wikipedia.org/wiki/Amdahl’s_law) in mind when assessing parallel improvements.

The key rendering loops are RenderSequential(), RenderParallel(), and RenderParallelShowingThreads(), all of which appear in Raytracer.cs. Comparing the first two, we see that the only difference is a simple rewriting of the outer loops:

for(int y=0; y < screenHeight; y++){ /* funcBody */ }

Parallel.For(0, screenHeight, y => /* funcBody */ )

A critical part of any parallel application is that the concurrent work (such as the funcBody in the parallel version) is thread-safe. This means that TraceRay() and other work called in the parallel loop must not alter any shared-state unless appropriate thread-synchronization is used or if the updates are known to be safe. You will note that the funcBody’s do alter a shared data-structure, rgb[], but that each element in this array will only be written to by a single execution of the funcBody and so no conflict will arise. The developer must always be vigilant that all concurrent work can occur without conflicts. If in doubt, simplify the concurrent work, use locks around risky code, or analyze the code and test to ensure corrrectness.

The application also makes use of a single System.Threading.Task to manage the whole process of running the animation, and to set the options regarding parallelism via a TaskManager. The TaskManager that is provided to this task become the TaskManager.Current for every task that is created inside, and so the configuration options also apply to the Parallel.For() that runs within the task (since Parallel.For() is built using Task, and Task instances by default use TaskManager.Current). Another useful feature of wrapping the animation loop in a task is to provide a simple means of tracking whether the animation is running, and to facilitate cancelling the animation. Diving in to btnStartStop_Click() in MainForm.cs, we see

_renderTask = Task.Create(RenderLoop,

chkParallel.Checked ? _parallelTm.Value : _sequentialTm.Value);

which creates the task to wrap the RenderLoop() and supplies a TaskManager to control the options for parallel execution. The first TaskManager, _parallelTm, will schedule one thread on every core of the machine, whereas _sequentialTm only uses one thread on one core in order to simulate a simple sequential implementation. Actually, each of these variables is actually a LazyInit<TaskManager> which ensure creation of the main type only if the variable is used. This is a minor usage of LazyInit<T> but the construct can be very useful for object creations that are expensive and avoidable; in this example, if you never run a parallel render, you’ll never instantiate the parallel TaskManager. Keep this handy construct in mind for objects that might go unused.

The code for handling the “stop-animation” action includes the following:

_renderTask.ContinueWith(delegate

{

BeginInvoke((Action)delegate

{

chkParallel.Enabled = true;

chkShowThreads.Enabled = chkParallel.Checked;

btnStartStop.Enabled = true;

btnStartStop.Text = “Start”;

});

_renderTask.Cancel();

This sets a continuation-handler for the task that is wrapping the RenderLoop(), and this continuation cleans up the GUI and re-enables various controls. Notice that we must use MainForm.BeginInvoke() in order to run the clean-up code on the GUI thread, as tasks run on a background worker thread. After the cleanup continuation is registered, we call Task.Cancel(). Notice that the RenderLoop() is checking for t.IsCanceled in order to detect the cancellation and co-operatively bail out of its animation loop. The standard overload for Task.ContinueWith() will run the continuation whenever the task finishes, whether due to successful termination, failure or cancellation, but there are additional overloads to control when the continuation fires and other task-related options.

There is an interesting system used for managing the bitmaps used for rendering. In particular, allocation of the image bitmaps is expensive, but we don’t want to have to wait for the GUI to display a bitmap before we start computation of the next. One way to solve this is a classic double-buffered solution with one bitmap for scene-drawing and another that is ready for blitting to the screen (this was the approach taken in the sample included with the December 2007 CTP). A more general solution is to keep rendering images at the maximum rate and add them to a queue for blitting. This solution has been implemented by using a set of bitmaps for drawing, and each is sent for blitting by calling Form.BeginInvoke()with an appropriate delegate. In order to reuse bitmaps rather than creating and disposing them repeatedly, an ObjectPool<T> (see ObjectPools.cs) has been built around a ConcurrentQueue<T> that supports object reuse and creation of new objects if no reusable ones are available. The ConcurrentQueue<T> is particularly useful as it provides all the machinery required to make the ObjectPool<T> thread-safe.

The C# ray tracer has been ported to Visual Basic and F#, including all of the items discussed above except for image colorization. To build and run the F# version, download the F# languge support pack from http://research.microsoft.com/fsharp/fsharp.aspx. (To compile the sample, if you’re running on x64, you’ll need to modify the project settings to point to System.Core.dll in the correct location, as it’s a different path than if running on x86.) In particular, notice that F# can directly make direct use of the Parallel Extensions and also gets an excellent performance boost on multi-core PCs:

member this.RenderToArrayParallel(scene, rgb : int[]) =

Parallel.For(0, screenHeight, fun y ->

let stride = y * screenWidth

for x = 0 to screenWidth – 1 do

let color = TraceRay ({Start = scene.Camera.Pos; Dir = GetPoint x y scene.Camera }, scene, 0)

let intColor = color.ToInt ()

rgb.[x + stride] <- intColor)

Finally, a completely different ray tracer implementation is shown in Samples\LINQRayTracer.

When you run this ray tracer, you will notice that it has a more complicated scene and does not perform animation, but as it renders you will see the concurrency in effect as the rows get drawn. The original query was written in the form:

from y in Enumerable.Range(0, screenHeight)

…

select from x in Enumerable.Range(0, screenWidth)

which is a natural 2-D construct to loop over each pixel in each row. In order to make this run concurrently on a per-row basis (as for the other ray tracers), we only have to ask for the outer query to be run in parallel mode:

from y in Enumerable.Range(0, screenHeight).AsParallel()

…

select from x in Enumerable.Range(0, screenWidth)

This sets the query to use PLINQ to partition the work of outer query over all the available cores. The other adjustment is to enumerate the query via

pixelsQuery.ForAll(row =>

{

foreach (var pixel in row)

{

rgb[pixel.X + (pixel.Y * screenWidth)] = pixel.Color.ToInt32();

}

int processed = Interlocked.Increment(ref rowsProcessed);

if (processed % rowsPerUpdate == 0 ||

processed >= screenHeight) updateImageHandler(rgb);

});

which causes the query to run to completion in parallel. Notice that the per-row worker delegate includes thread-safe access to the rowsProcessed counter to be safe in concurrent execution.