Sergey Nenakhov posts to the DirectX developer list that he’s created a GPU profiling tool for Direct3D applications:
I’ve created a tool, which hooks directx api inside any desired application. Then each draw call will be wrapped with asynchronous timing query, and when timing info arrives for all draw calls made during a frame an overlay is rendered showing you the timing information of each dip (or memory transfer) in a compact and informative manner. For deeper analysis you can press the ‘~’ button (or whatever key you have below the ESC) to pause the sampling, and when paused, mousing over any bar will show you the callstack where that draw call was made with precise timing information about that call. I personally find this tool extremely useful because it can show you the bottleneck of your application very quick, and also it shows relative costs of various stuff you have going on in your application, giving you a significantly better understanding of the performance aspect of your game.
I haven’t tried this tool myself, but it looks interesting and might prove valuable to any Direct3D developer trying to performance tune their application. (Remember: first get it right, then get it tight! Or, as Donald Knuth says, “Premature optimization is the root of all evil.”)