VS 2010 Beta 2 Concurrency Resource Profiling In Depth First Look

14 Comments October 19, 2009

Nearly everyone reading this is using a machine with multiple cores. With a basic laptop containing a dual core CPU and your average desktop with a four core CPU, we have processing power our computing ancestors could only dream about. Most of us developers are writing multithreaded applications to take advantage of that power, but there's a sad secret that developers keep amongst ourselves. Unless you are some sort of savant who can visualize all your threads in perfect parallelization, we really don't have a good clue if we're successfully taking advantage of those multiple threads.

The whole trick to multithreading is to never give up the time slice (called the quantum in kernel-speak). Just how do you give up the time slice? By synchronizing. Whenever you're waiting to acquire a synchronization object, you're not multithreading. Unfortunately, when developers are designing their code, they go through a few thought games and think they have an idea how things will work. What happens in production is that they don't seem to get the scalability they expect, but don't know why.

It turns out that many of these multithreading problems are that your code has one or more synchronization objects that you're holding onto far longer than you expect. The trouble is that yesterday there was no way for your average developer to figure out what synchronization objects were causing all the contention without an extremely deep analysis through reading the source code and attempting to model the threads on paper.

Wouldn't it be nice if we could have a tool that would point out all the synchronization objects that our application is fighting over? That way we could focus our analysis on just the spots in the code that matter to see where the skirmishing between threads is causing us to give up the time slice. Today there is just such a tool in Visual Studio 2010 Beta 2 called the Concurrency profiler.

I've already written about the improvements in the Sampling and Instrumentation profiler, but the Concurrency profiling is a brand new feature for Visual Studio 2010. As with the CPU and memory profiler, the Concurrency profiling works just great on .NET 2.0 binaries so you can start using it today to find those multithreading contentions and finally ensure you are multithreading correctly.

To enable the Concurrency profiler from the Performance Explorer, General property page, choose Concurrency and check the "Collect resource contention data" field. If you want to be hard core, you can also start all your profiling from the command line with VSPERFCMD.EXE and you'd use the /START:CONCURRENCY,RESOURCEONLY option. I'll talk more about the Visualization option in another article.

The Concurrency profiler hooks both the managed and native synchronization methods/functions and when you code calls one, records the time spent blocking and the call stack. As we are still working with a Beta product, I haven't done any formal performance tests, but the performance difference is so negligible that you won't even know you're gather the resource contention data. In fact, the first time I ran the Concurrency profiler, I thought it was not running because I couldn't tell a performance difference.

To demonstrate the Concurrency profiler, I wrote a simple program that fired up two threads to execute the following method.

private void SyncBlockThread()
{
    for (int i = 0; i < 100; i++)
    {
        lock (syncBlock)
        {
            Trace.WriteLine(String.Format("{0} has the sync block",
                                         Thread.CurrentThread.Name));
            Thread.Sleep(randWait.Next(500));
        }
    }
}

 

After the two threads ended, I fired up two additional threads that both executed the following method.

private void MutexThread()
{
    for (int i = 0; i < 100; i++)
    {
        try
        {
            theMutex.WaitOne(-1);
            Trace.WriteLine(String.Format("{0} has the mutex",
                            Thread.CurrentThread.Name));
            Thread.Sleep(randWait.Next(500));
        }
        finally
        {
            theMutex.ReleaseMutex();
        }
    }
}

 

After your program ends or you detach the profiler, you get a view of your application you could only dream about before. The graph in the upper left corner shows you how many contentions were occurring as your application ran. In the example below, there are a maximum of seven contentions and contentions occurring all through most of the application. In an ideal world, you would never see this kind of graph. I'll admit that this is a contrived program, but it is an excellent example to show exactly what the Concurrency profiler can do.

Like the sampling profiler I wrote about previously, the Concurrency profiler defaults to "Just My Code" which means only shows you the handles and threads where you have source code. If you need to see everything in the application because you are using libraries that don't have PDB files, click the Show All Code link in the Notifications section in the upper right corner to see all handle contentions across your application.

The two tables at the bottom show the story of the contentions causing the most trouble. In most cases, you care about the Most Contended Resources table, as that shows the synchronization objects the threads are fighting over. To analyze the fighting, click on the handle and you'll jump to the Resource Details view so you can see how much blocking is occurring on the resource.

The Total graph shows the combined blocking time of all threads blocking on that particular handle. Below that you see each thread showing exactly when, where (through the call stack), and how long a particular thread wasn't getting any work done. This is a brilliant view and will help point you right at those problems for a particular resource. Keep in mind that normal applications won't show patterns like my example here, but it's informative to see what the degenerative case looks like.

If you're curious about seeing a thread and all the different resources the thread is blocking on, in the Summary view, click the thread id and you'll see the above view inverted where the thread blocking patterns as the top graph with all the different resource graphs below it. That's a handy view for threads that are grabbing synchronization resources from all over the place.

There are other views supported by the Concurrency profiler such as the Call Tree view, which is a nice view to see the call path with the most contentions. However, after running many different types of applications under the Concurrency profiler, the Summary and Contentions by Handle/Thread are the ones you're going to spend all your time analyzing. I love tools that are simple, yet solve a very hard problem.

After playing with Concurrency profiler for a while there are a few things I've noticed that will help you out. The first is that if you're writing .NET applications and you totally control the threads, in other words, not using a thread queue, set the thread Name property. That makes it easier to identify the threads instead of the method name. For native C++ applications, the thread naming trick that works in the debugger does not work in the Concurrency profiler.

The second trick is simple. If you want to run Concurrency profiling on a WPF, Windows Forms, or console application, I find it best to disable the Visual Studio hosting process in the solution property pages. Running under the hosting process adds additional threads and synchronization to your application and by getting those out of the way it's easier to see just your stuff.

The last trick I'm seeing is that to get the best information about the contention, you need to stress the application. Just doing a quick run in most cases won't give you a good feel for the application threading. The Concurrency profiler needs to be a "must do" task for your stress tests. The more you bang on the application, the more likely you'll start getting the real picture of your threading.

As we are at Beta 2, I'm hoping that the team can fix one thing in the Concurrency profiler. All synchronization objects are reported as "handles," I would love to see the type of handle and more importantly, the handle name if one was specified at creation. That would make it much easier to identify your objects at a glance. It seems easy to me to accomplish this by hooking all the handle creation methods and recording those with names specified. Of course, if it were easy, the team would have done it already. However, I'll be filing this as a feature request and keep my fingers crossed that my dream comes true at RTM.

The Concurrency profiler is a fascinating insight to your threads. Even if you don't believe you have a problem it's one of those tools you need to run regularly to get a feel for your application's behavior. Half of performance tuning is understanding your normal behavior so you know when you're dealing with an outlier condition. Remember, the Concurrency profiler works fine on today's binaries so you can, and should, start using it immediately.


14 Comments

  • Gravatar Image
    MaciejS October 19, 2009 5:32 PM

    Wow, looks nice. Does it work for C++ applications as well, or is it C# only?

  • Gravatar Image
    jrobbins October 19, 2009 5:39 PM

    MacieJS,

    The Concurrency Resource Profiling works fantastic for both C# and C++!

    - John Robbins

  • Gravatar Image
    Normal People Bore Me! October 20, 2009 5:10 AM

    Per chi vuole approfondire le novit

  • Gravatar Image
    Brien October 21, 2009 3:28 PM

    I'm seeing contention when using Monitor.Wait/Pulse for thread coordination.

    After thinking this through, it seems inherent in the design (see below).

    Is there a lighter weight mechanism for achieving this that is contention free?

    Brien

    -----

    The standard usage looks like:

    Thread A:

    lock (monitor) {
    while (!CheckState()) {
    Monitor.Wait(monitor); // we're here
    }
    }

    Thread B:
    lock (monitor) {
    ChangeState();
    Monitor.PulseAll();
    //

  • Gravatar Image
    jrobbins October 22, 2009 12:58 PM

    Brien,

    If you're using lock or Monitor, there's no way to be contention free. :) You'll need to do something like Jeffrey Richter's Async Enumerator, or use the Interlocked class to achieve contention free threading. Make sense?

    - John Robbins

  • Gravatar Image
    John Robbins' Blog October 23, 2009 11:59 AM

    One of the most important technologies used at Microsoft is Event Tracing for Windows (ETW), especially

  • Gravatar Image
    Brien October 26, 2009 11:40 AM

    Hi John,

    Async Enumerator looks like it uses the Thread Pool and I don't see how Interlocked can provide a wait mechanism.

    If I want to implement a Producer/Consumer Queue with a Blocking Pop I would code a call to Wait() on a monitor with the Push calling Pulse.

    I would like to avoid contention, but as I mentioned, it's inherent to monitor. What I'm looking for is an alternate way to signal/wait that is safe and contention free.

    Thanks,
    Brien

  • Gravatar Image
    jrobbins October 26, 2009 8:47 PM

    Brien,

    Gee, that's a pretty tall order. :) Off the top of my head I don't see how it's possible to do without contention. If you do figure it out, it'll make you rich and famous. :)

    - John Robbins

  • Gravatar Image
    Adam October 29, 2009 5:40 AM

    I have a multithreaded C++ application built with VS 2003. It crashes on multi-core machines. I haven't been able to figure it out. Is there a tool for 2003 to figure this out or I need to port this app to VS 2010 to make use of Concurrency Profiler. I seem to be inclined towards first option first!

  • Gravatar Image
    jrobbins October 29, 2009 11:40 AM

    Adam,

    You don't need to port anything. As I mentioned, the Concurrency Resource Profiler can work on any random binary so use VS 2010 to analyze your existing binary. Take advantage of this gift Microsoft has given us! :)

    - John Robbins

  • Gravatar Image
    joe January 8, 2010 7:35 PM

    How did you implement randWait() in your test? I'm curious about the correct way to use Random.Next in a safe concurrent way.

    Thanks.

  • Gravatar Image
    jrobbins March 11, 2010 12:20 AM

    Joe,

    I used a lock around it.

    - John Robbins

  • Gravatar Image
    mgoldin's blog June 8, 2010 3:23 PM

    Visual Studio 2010 brings a number of innovations to multithreaded and parallel computing. One of them

  • Gravatar Image
    Visual Studio Profiler Team Blog June 8, 2010 3:28 PM

    Visual Studio 2010 brings a number of innovations to multithreaded and parallel computing. One of them

Have a Comment?

Archives