BenchmarkXPRT Blog banner

Category: Benchmark metrics

What’s in a name?

A couple of weeks ago, the Notebookcheck German site published a review of the Huawei P8lite. We were pleased to see they used WebXPRT 2015, and the P8 Lite got an overall score of 47. This week, AnandTech published their review of the Huawei P8lite. In their review, the P8lite got an overall score of 59!

Those scores are very different, but it was not difficult to figure out why. The P8lite comes in two versions, depending on your market. The version Notebookcheck used is based on HiSilicon’s Kirin 620, while the version AnandTech used was Qualcomm’s Snapdragon 615 SoC. It’s also worth noting that the phone Notebookcheck tested was running Android 5.0, while the phone AnandTech tested was running Android 4.4. With different hardware and different operating systems, it’s no surprise that the results were different.

One consequence of the XPRTs being used across the world is that is that it is not uncommon to see results from devices in different markets. As we’ve said before, many things can influence benchmark results, so don’t assume that two devices with the same name are identical.

Kudos to both AnandTech and Notebookcheck for their care in presenting the system information for the devices in their reviews. The AnandTech review even included a brief description of the two models of the P8lite. This type of information is essential for helping people make informed decisions.

In other news, Windows 10 launched yesterday. We’re looking forward to seeing the TouchXPRT and WebXPRT results!

Eric

Seeing the whole picture

In past posts, we’ve discussed how people tend to focus on hardware differences when comparing performance or battery life scores between systems, but software factors such as OS version, choice of browser, and background activity often influence benchmark results on multiple levels.

For example, AnandTech recently published an article explaining how a decision by Google Chrome developers to increase Web page rendering times may have introduced a tradeoff between performance and battery life. To increase performance, Chrome asks Windows to use 1ms interrupt timings instead of the default 15.6ms timing. Unlike other applications that wait for the default timing, Chrome ends up getting its work done more often.

The tradeoff for that increased performance is that waking up the OS more frequently can diminish the effectiveness of a system’s innate power-saving attributes, such as a tick-less kernel and timer coalescing in Windows 8, or efficiency innovations in a new chip architecture. In this case, because of the OS-level interactions between Chrome and Windows, a faster browser could end up having a greater impact on battery life than might initially be suspected.

The article discusses the limitations of their test in detail, specifically with regards to Chrome 36 not being able to natively support the same HiDPI resolution as the other browsers, but the point we’re drawing out here is that accurate testing involves taking all relevant factors into consideration. People are used to the idea that changing browsers may impact Web performance, but not so much is said about a browser’s impact on battery life.

Justin

Comment on this post in the forums

It’s all in the presentation

The comment period for BatteryXPRT CP2 ended on Monday. Now we are in the final sprint to release the benchmark.

The extensive testing we’ve been doing has meant that we’ve been staring at a lot of numbers. This has led us to make a change in how we present the results. As you would expect, the battery life when you’re running the test using Wi-Fi is different than when you’re running it using your cellular network. Although individual devices vary, the difference is in the vicinity of 10 percent, about the same as the difference between Airplane mode and using Wi-Fi.

BatteryXPRT has always captured a device’s Wi-Fi setting in its disclosure results, but had not included this information with the results. Because we found it so helpful to have the Wi-Fi setting alongside the results, we have changed the presentation of the results to recognize three modes: Airplane, Wi-Fi, and Cellular. We hope that this will avoid confusion as people are using BatteryXPRT.

Note that we have not changed the way the results are calculated. Results you generated during the preview are still valid. However, results from one mode should not be compared to results from another mode.

We’ve been talking a lot about BatteryXPRT, but TouchXPRT is also looking great! We’re looking forward to releasing both of them soon!

Eric

Comment on this post in the forums

Staying out in the open

Back in July, Anandtech publicized some research about possible benchmark optimizations in the Galaxy S4. Yesterday, Anandtech published a much more comprehensive article, “The State of Cheating in Android Benchmarks.” It’s well worth the read.

Anandtech doesn’t accuse any of the benchmarks of being biased—it’s the OEMS who are supposedly doing the optimizations. I will note that none of the XPRT benchmarks are among the whitelisted CPU tests. That being said, I imagine that everyone in the benchmark game is concerned about any implication that their benchmark could be biased.

When I was a kid, my parents taught me that it’s a lot harder to cheat in the open. This is one of the reasons we believe so strongly in the community model for software development. The source code is available to anyone who joins the community. It’s impossible to hide any biases. At the same time, it allows us to control derivative works. That’s necessary to avoid biased versions of the benchmarks being published. We think the community model strikes the right balance.

However, any time there is a system, someone will try to game it. We’ll always be on the lookout for optimizations that happen outside the benchmarks.

Eric

Comment on this post in the forums

Lies, damned lies, and statistics

No one knows who first said “lies, damned lies, and statistics,” but it’s easy to understand why they said it. It’s no surprise that the bestselling statistics book in history is titled How to Lie with Statistics. While the title is facetious, it is certainly true that statistics can be confusing—consider the word “average,” which can refer to the mean, median, or mode. “Mean average,” in turn, can refer to the arithmetic mean, the geometric mean, or the harmonic mean. It’s enough to make a non-statistician’s head spin.

In fact, a number of people have been confused by the confidence interval WebXPRT reports. We believe that the best way to stand behind your results is to be completely open about how you crunch the numbers. To this end, we released the white paper WebXPRT 2013 results calculation and confidence interval this past Monday.

This white paper, which does not require a background in mathematics, explains what the WebXPRT confidence interval is and how it differs from the benchmark variability we sometimes talk about. The paper also gives an overview of the statistical and mathematical techniques WebXPRT uses to translate the raw timing numbers into results.

Because sometimes the devil is in the details, we wanted to augment our overview by showing exactly how WebXPRT calculates results. The white paper is accompanied by a spreadsheet that reproduces the calculations WebXPRT uses. If you are mathematically inclined and would like to suggest improvements to the process, by all means let us know!

Eric

Comment on this post in the forums

Keep them coming!

Questions and comments have continued to come in since the Webinar last week. Here are a few of them:

  • How long are results valid? For a reviewer like us, we need to know that we can reuse results for a reasonable length of time. There is a tension between keeping results stable and keeping the benchmark current enough for the results to be relevant. Historically, HDXPRT allowed at least a year between releases. Based on the feedback we’ve received, a year seems like a reasonable length of time.
  • Is HDXPRT command line operable? (asked by a community member with a scripted suite of tests) HDXPRT 2012 is not, but we will consider adding a command line interface for HDXPRT 2013. While most casual users don’t need a command line interface, it could be very valuable to those of us using HDXPRT in labs.
  • I would be hesitant to overemphasize the running time of HDXPRT. The more applications it runs, the more it can differentiate things and the more interesting it is to those of us who run it at a professional level. If I could say “This gives a complete overview of the performance of this system,” that would actually save time. This comment was a surprise, given the amount of feedback we received saying that HDXPRT was too large. However, this gets to the heart of why we all need to be careful as we consider which applications to include in HDXPRT 2013.

If you had to miss the Webinar, it’s available at the BenchmarkXPRT 2013 Webinars page.

We’re planning to release the HDXPRT 2013 RFC next week. We’re looking forward to your comments.

Eric

Comment on this post in the forums

Check out the other XPRTs:

Forgot your password?