CPU Benchmarking 101
Benchmarking a PC is an essential activity, if you want to separate the hype from the facts. While component and PC manufacturers constantly brag about why this is the "best part ever," benchmarking is where the rubber meets the road. Specs are interesting, but putting a system through its paces will tell you more about its true performance.
In order to test a system, reviewers, including Maximum PC, have benchmark suites. When a new CPU is introduced, running a full battery of benchmarks and comparing these to the previous-generation processors allows us to measure performance changes. We test with our own selection of tests that we've dubbed the Zero-Point Suite. Like all good suites, we include a comprehensive battery of tests, across a variety of workloads. As PCs change over time, the Zero-Point Suite grows and is modified to be relevant for current performance needs.
While some of the tests are available to end users, others are for the pros and are not as easily reproducible by home users. Besides having access to the software, you need appropriate test files, such as the content we use for our video and image-manipulation tests. But that doesn't mean you can't create your own test suite, which is why we're calling this "Home Benchmarking." We'll look at widely available software solutions and run some essential benchmark tests that give a good idea of CPU performance. (Note that not all of our benchmarks are difficult to reproduce at home; e.g., Cinebench and Techarp's x264 tests are easy to acquire and run.)
For example...
Theoretics aside, a home user may want to compare the performance of systems that don't have widely available benchmarks. This frequently happens in two scenarios: older systems that are no longer tested on the latest processors, and mobile processors that do not get as much attention as their desktop brethren. So, to provide a real-world example of homebrew benchmarking, let's compare three CPUs.
First, we have the desktop AMD Phenom II 945. This chip is well past its prime (it was introduced way back in 2009), but it consists of a quad-core architecture with a 3GHz clock speed. It features a 95W Thermal Design Power (TDP); the 45nm process dates it, though the 3MB of L3 cache does give it some flabby muscle. It doesn't have Hyper-Threading or any other form of SMT (Simultaneous Multi-Threading), so this is a true quad-core part. The aging architecture may not compete with modern designs when it comes to instructions per clock (IPC), but many users are still using similar hardware.
Next, we have the Intel i5-5200U. This is a relatively modern Broadwell part from the mobile family, only recently having been superseded by the new mobile Skylake parts. It's found in many mainstream notebooks and is built on the current state-of-the-art 14nm process. Skylake parts are also 14nm, and as they become more available they're edging out their Broadwell siblings. Not only is this a mobile part, but it's a ULV (Ultra-Low Voltage) model that sips just 15W of power at load, and far less at idle. It was released in Q1 2015, and it also has 3MB L3 cache. It is a dual-core part, with Hyper-Threading technology, which gives it four virtual cores. It has a base clock speed of 2.2GHz and can turbo up to 2.7GHz for short periods of time as needed.
Finally, we have a second laptop sporting an Intel i7-4702HQ, one of the first generation of mobile Haswell parts. This processor is nearly three years old now, but it's interesting in that it was one of Intel's first sub-45W quad-cores, rated at 35W TDP. It's easy to get caught up chasing newer technology, but higher numbers aren't the only thing that matters; in the Intel CPU world, letters are sometimes more important.
So, you might be wondering how the relatively new mobile processor compares to an older mobile part, or an even older still desktop part. Finding direct comparisons on the web can be difficult, given the age gap and target markets. The best comparison will come from running benchmarks ourselves, and with knowledge of the workloads, we can analyze the results.
Reproducibility
Before we get to the actual software, there are a few standards that should be mentioned to ensure the accuracy of the results. Benchmarking software should be run with the same settings on all the machines used, or else the results can't be compared. In addition, the same version of the software should be used. There should also be no background processes—yes, defragmenting your hard drive or installing Windows updates will potentially suck up system resources that will affect the benchmark results. Next, the benchmark should be run from the main system drive, not from an external drive such as a USB stick, as that can also hurt the performance. Finally, the software should be extracted, and not run from within a compressed file.
y-cruncher 0.6.9
Calculating Pi has been a mathematical competition of sorts dating back to the ancients. The first computer to calculate a million digits was a Cray system (CDC 7600) back in 1973, and it took 23.3 hours. With modern computers and some freeware, it can be easily done in just a few seconds. Calculating Pi tells us about the floating-point performance of the CPU.
An earlier program, SuperPi, used to be a staple for benchmarks, and was especially popular with overclockers. It was replaced by HyperPI, which is a more modern derivation. Both programs have limitations in that the results are not consistently reproducible, and do not scale with multiple cores. This type of number-crunching calculation is heavily dependent on CPU clock speed. Also, in general, the Intel chips are favored over AMD CPUs.
While Hyper Pi is a more modern derivation, allowing you to calculate Pi simultaneously on multiple cores, it's doing the same calculation on each core rather than combining all cores to solve one calculation. A better alternative is y-cruncher. This program is quite fast, and the results are consistent. In addition, the workload can be assigned to a single thread, or to multiple threads, and the results scale. In fact, while SuperPI is often run to calculate pi to one million digits, as this takes less than a second on y-cruncher, we ran the tests to 100 million digits.
Let's look at the results of the y-cruncher benchmark on our three contenders:
- AMD Phenom II X4 945: single-threaded 174.3 seconds, multi-threaded 71.1 seconds
- Intel Core i5-5200U: single-threaded 66.3 seconds, multi-threaded 31.7 seconds
- Intel Core i7-4702HQ: single-threaded 56.4 seconds, multi-threaded 15.0 seconds
Our aging AMD quad-core simply cannot keep up, falling behind when it comes to floating-point performance on a single-core workload with the Intel i5-5200U being 2.63 times faster. Even compared to a mobile low-power part, the IPC improvements across several generations of Intel processors propel the i5-5200U to an easy victory. Meanwhile, the Intel mobile quad-core is only moderately faster than the newer dual-core on single-threaded performance, but the extra cores easily surpass the ULV part. Score one for Intel's "Blue team."
We also get a glimpse into how well the multiple cores and threads are processed in this multi-threaded workload. In the case of the Intel i5-5200U, going from a single thread to multiple threads, the calculation is done 2.09 times faster. As this CPU is a dual-core part, this makes perfect sense as both cores get worked, and the greater than 100 percent speedup comes from Hyper-Threading. The i7-4702HQ doesn't scale quite as well, but it may be running into thermal constraints (lower clock speed under load), so the 3.76X improvement going from one thread to eight threads is still quite good.
The AMD Phenom II X4 945 does not scale as well when comparing the single thread to multiple threads. This CPU has four physical cores but no Hyper-Threading. This helps to explain why the multiple thread is only 2.45 times faster than with a single thread, which again points to lower efficiency in the older part.
7-Zip
For our next benchmark, we'll use 7-Zip. This is the freeware file compression alternative to WinZip. While it is plenty handy for compressing and decompressing files, a useful feature is the benchmarking capability. The program version used is 7-Zip 15.14, and the feature needed is located under the Tools > Benchmark menu.
File compression and decompression are good multi-threaded workloads, and they show how the CPU will perform on a high computation task. Another strength is that 7-Zip is a real-world benchmark, in that just about everyone has to deal with a .zip file at some point. It should be pointed out that a limitation of 7-Zip is that it's based on a compression algorithm that is not only dependent on CPU performance, but is also sensitive to memory performance, so memory speeds as well as dual-channel or quad-channel configuration can influence the outcome.
The default settings use a 32MB dictionary with the maximum number of CPU threads supported (in our example four or eight). This benchmark reports the Compressing and Decompressing scores separately, and as it reaches a steady state, a "Total Rating" score gets reported. In the example above, we get a Total Rating of 7684 MIPS (million instructions per second).
Let's see how the CPUs perform:
- AMD Phenom II X4 945: 9447 MIPS
- Intel Core i5-5200U: 7,684 MIPS
- Intel Core i7-4702HQ: 1,7027 MIPS
This result is interesting as the "Green Team's" AMD Phenom II 945 pulls off a 19 percent lead over the i5-5200U. The four physical cores give the AMD part an edge on this multi-threaded workload, but the i5-5200U is not all that far behind even with two fewer physical cores, showing the value of its higher IPC, Hyper-Threaded cores, and Turbo Boost clock speeds. The quad-core fourth-generation Intel chip continues its lead, with performance that's 80 percent faster than the Phenom and 122 percent faster than the low-power Broadwell part. Power is clearly limiting Broadwell-U here, as we've normally measured a 5–10 percent increase in performance over Haswell at the same clock speed.
Cinebench R15
Cinebench R15 is a benchmark included in many testing suites, including ours at Maximum PC. The download includes the data set, so the home user can run the same set that the "Big Boys" use. Another feature is that it can be run in both single-threaded and multi-threaded modes to isolate the benefit of multiple cores and Hyper-Threading. By default, it runs in multi-threaded mode, so you need to enable the "advanced" options for the single-threaded testing.
Here are our Cinebench R15 scores:
- AMD Phenom II X4 945: single-threaded 70 cb, multi-threaded 217 cb
- Intel Core i5-5200U: single-threaded 90 cb, multi-threaded 233 cb
- Intel Core i7-4702HQ: single-threaded 121 cb, multi-threaded 577 cb
The scores get reported in the units of "cb," which is a relative score for the test. The increased efficiency of the i5-5200U leads it to a victory on both the single- and multiple-core modes. Even with the four cores, the AMD Phenom II X4 945 gets beaten by the dual-core part. Overall, in multiple-core mode, the i5-5200U is 7.3 percent faster. The older i7-4702HQ meanwhile posts substantially higher results, thanks to the higher TDP and clock speeds. There are times when the progress from generation to generation of CPU seems slow, but when you see a 35W mobile part more than double the performance of a five-year-old 95W desktop part, it's still impressive.
Fritz Chess Benchmark
Fritz Chess Benchmark is another multi-threaded workload, based on the calculations for playing chess in the Fritz Chess Playing Software. Running this benchmark is fairly intensive, and gives the CPU a good workout. While the benchmark version is now several years old, it can still give modern CPUs a workout.
When the program is first run, the Fritz Chess Benchmark will identify the number of logical processors. By default, all processor cores are used, though you can specify a lower or higher number if desired. Generally, the full number of cores should be used, so we use four processors on the Phenom II and i5-6200U, and eight on the i7-4702HQ.
The benchmark result is given in two numbers. One is the "Kilo nodes per second," which is a number that indicates the raw number-crunching performance for the benchmark. The other is the "Relative speed," which is a ratio for the tested processor compared to a Pentium III clocked at 1GHz.
So, how did our CPUs do on the Fritz Chess Benchmark?
- AMD Phenom II X4 945: 13.86 Relative speed
- Intel Core i5-5200U: 9.89 Relative speed
- Intel Core i7-4702HQ: 23.10 Relative speed
Once again, the physical cores of the venerable Phenom II 945 hit their stride and take the lead over the dual-core Broadwell part; the Phenom II is actually 40 percent faster on this run. Meanwhile, the quad-core Haswell chip is still 67 percent faster than the Phenom.
CPU UserBenchmark
CPU UserBenchmark takes a Web 2.0 crowd-sourced approach to the benchmarking problem. The software tests all the major systems of the PC, including the CPU, storage, GPU, and memory. We'll focus on the CPU aspect here, to eliminate other variables.
The CPU is tested using a single-threaded workload, a quad-core workload, and a multi-core workload. This gives a more complete idea of the performance across a variety of situations. The program then assigns raw point values on each of the tests.
The real value comes in with the comparison to the crowd-sourced data, allowing you to compare your CPU to others in the database. For example, the i5-5200U in our collection came up as a little above average at 57.5 percent in the 69 percentile. As our CPU is running at stock speeds, it's reassuring to see the CPU performing properly.
The CPU User Benchmark website is also useful for looking at other CPUs that have been benchmarked. For example, when considering a CPU upgrade, you can see, based on actual metrics and not just press release hype, how much faster one CPU is in relation to another, and across what types of workloads.
Comparing our CPUs, UserBenchmark shows the old Phenom and mobile Broadwell are closely matched overall, with the Phenom II X4 945 coming up as five percent faster. Getting into the granular details, the i5-5200U has the faster single-core speed of +24 percent, with the AMD Phenom II X4 945 having the advantage on the quad-core speed of +21 percent. This is also consistent with the previous benchmarks we have discussed.
The higher-TDP Haswell CPU, on the other hand, easily wins once more. Single-core performance is 20 percent faster than the ULV Broadwell and 49 percent faster than Phenom II, quad-core performance is 70 percent faster vs. the i5-5200U and 40 percent faster than Phenom, and multi-core performance is 131 percent and 93 percent faster. More cores and more power combined with newer architectures make modern laptops quite potent.
As a one-stop benchmark, UserBenchmark provides a good snapshot of the performance of a system, and in particular the CPU across various workloads. The ability to compare the performance to identical CPUs, as well as competitor processors, increases the value of the data.
Give it a whirl!
With an array of options, benchmarking is within easy reach for modern computer users. We didn't touch on x264 HD, PCMark/3DMark, or any number of other benchmarks you can freely acquire and run, but this short list should suffice as a way to get started.
One difficulty that remains is understanding exactly which aspects of a CPU are being tested, and unfortunately, there are many benchmarks where this information isn't readily available. Does a benchmark stress the floating-point, integer, memory, storage, or other areas? How indicative of real-world use are the tests? These are all things to consider, and the more data you have, the better.
Ultimately, running benchmarks is a great way to verify that your performance is in line with your expectations, and that everything is functioning properly. Online benchmarking takes this a step further by allowing you to compare your system with others, and tools like NZXT's CAM can look at more than just your CPU. Just remember: Knowledge is power, and with great power comes great responsibility.