dx11_vs_dx12_intel_cpu_scaling_gaming_framerate - 1 HARDOCP - Introduction - DX11 vs DX12 Intel CPU Scaling and Gaming Framerate

DX11 vs DX12 Intel CPU Scaling and Gaming Framerate

Last week we set our sights on seeing how well the new DX12 API is able to distribute workloads across multiple CPU cores on a new AMD processor. Today we are going to continue, except this time we will be using an Intel Haswell-E processor that has a lot of cores available for DX12 usage. A couple new GPUs in the mix as well.

DX12 and CPU Usage

I am going to do a bit of cut and paste here from our last article, as I think the introduction needs to be made again.

For the last year, the Video Card Forums have been abuzz with "very adult" discussions about DX12 and the benefits attached to it. One thing that has been very much discussed is about how DX12 will better utilize our multicore CPUs. We have recently covered Rise of the Tomb Raider and its DX12 patch. We took notice to one statement that the developer had to say about their DX12 implementation.

Even though the game can use all your CPU cores, the majority of the DirectX 11 related work is all happening on a single core. With DirectX 12 a lot of the work is spread over many cores, and the framerate of the game will run at can be much higher for the same settings.

For Rise of the Tomb Raider the largest gain DirectX 12 will give us is the ability to spread our CPU rendering work over all CPU cores, without introducing additional overhead.

This is surely a feature that all computer hardware enthusiasts want to see working! Jurjen Katsman, Studio Head at Nixxes Software pointed out, that they increased framerate from 46FPS to 60FPS in a particular scene by moving to DX12.

As an example to illustrate the point, below is a screenshot of a scene in the game running on an Intel i7-2600 processor with 1333Mhz memory, paired with a GTX 970. Using DirectX 11 at High Settings we would only get 46 fps. Now look at the same location the new DirectX 12 implementation, we can lift it up to 60!

Benchmarking the DX12 Data

This all got me excited about collecting data points and sharing those here and I dove into Rise of the Tomb Raider built-in benchmark here on my CPU and Motherboard test bench. What my goal was to find CPU-limited workloads, and see how those performances scale as we relieve the CPU limitation. Of course we want to do this in system environments that are GPU limited somewhat and not GPU limited at all.

To make a long story short, after running well over 300 RoTR benchmarks, I did not have the data that I needed to show the differences outlined by Nixxes' screenshots. And quite frankly a screenshot of a single point in a game is worthless for actually proving the point. Since we do not have a way of collecting real world gameplay framerate data in DX12 as of yet, our hands are tied to the in-game benchmark. While average framerates are solid in terms of data accuracy, we were not seeing any support for the DX12 advantage statements above. I looked at utilizing the minimum framerate numbers generated by the benchmark, but this benchmark is simply not a good tool for collecting minimum framerate data. When looking at minimum framerates produced by the RoTR benchmark, you will see variance in scores by up to 33%, run to run. With that consideration, I ran large sets of benchmarks with the intention of finding some good data after averaging the numbers. That was fruitless as well. The Rise of the Tomb Raider benchmark SUCKS when it comes to utilizing the minimum framerate data. There is simply not a wide enough swath of data taken by the benchmark; it is simply too narrow in scope. I have seen some folks throwing these numbers around as "proof" of their points; just let me say that a good description of that would be "bullshit." In terms of canned benchmarks, Rise of the Tomb Raider's is one of the worst I have seen implemented in a long time...not that I look at those much to be honest. So I moved on.

Ashes of the Singularity, currently the poster child for Async Computing (Yes, I will touch on Async later as well.), also has a built-in benchmark. Its benchmark actually is very well constructed for testing CPU workload scaling in DX11 vs. DX12. We did a preview of the game's performance using its canned benchmark tool last week; Ashes of the Singularity Day 1 Benchmark Preview. After running about 150 sets of benchmarks with the AotS benchmark tool, I felt as though we had some solid data to share with our reader. (Yeah, we actually run the benchmark more than once and just start posting data!)

In terms of visual fidelity, AotS is not that impressive. In fact with all of its graphical features turned on at 1440p with DX12, an AMD Radeon R9 Fury X card and the right CPU you can make the benchmark 0% GPU limited. This is however a Real Time Strategy game that can get to a massive scale in terms of units on the screen, so in terms of a CPU workload being generated, it is quite able to do that depending on the system. So while AotS is a good DX12 benchmark in terms of CPU workload, it is not much of a GPU testbed, which is perfect for what we are doing here today.

Linked below is a full resolution AotS screenshot saved from the beginning of the benchmark as a PNG file. The linked image is a 6MB. Not to bag on the game, but there is just not much eye candy to look at.