BenchmarkXPRT Blog banner

Category: Benchmark metrics

More than the sum of its parts

There was a recent article in Bloomberg about phone maker ZTE’s increasing market share in the US. The article singled out one phone, the ZTE Maven, which costs about $60 (US).

This phrase jumped out at me: “a processor with capabilities somewhere between the iPhone 5 and 6.” The iPhone 5S could also fit that description. The ZTE Maven uses the ARM Cortex-A53, 64-bit processor running at 1.2 GHz. The Apple iPhone 5s uses the Apple Cyclone-A7 Cortex-A7 Harvard Superscalar processor running at 1.3 GHz.

We decided to put that statement to the test. We ran WebXPRT 2015 on the ZTE Maven and its score was 47. The iPhone 5s scored 100. The Maven was not even close.

As we’ve said before, the performance of a device depends on more than the GHz of its processor. For example, the ZTE Maven uses the Snapdragon 410 SoC, which was aimed at mid-level devices. The iPhone 5s uses the Apple A7, which was intended for higher-end devices.  You can find side by side specs here.

Be wary when you see unsupported performance claims. As this example shows, specs can appear comparable even when the actual performance of the devices differs considerably. A good benchmark can provide insights into performance that specs alone can’t.

Eric

Question we get a lot

“How come your benchmark ranks devices differently than [insert other benchmark here]?” It’s a fair question, and the reason is that each benchmark has its own emphasis and tests different things. When you think about it, it would be unusual if all benchmarks did agree.

To illustrate the phenomenon, consider this excerpt from a recent browser shootout in VentureBeat:

 
While this looks very confusing, the simple explanation is that the different benchmarks are testing different things. To begin with, SunSpider, Octane, JetStream, PeaceKeeper, and Kraken all measure JavaScript performance. Oort Online measures WebGL performance. WebXPRT measures both JavaScript and HTML 5 performance. HTML5Test measures HTML5 compliance.

Even with benchmarks that test the same aspect of browser performance, the tests differ. Kraken and SunSpider both test the speed of JavaScript math, string, and graphics operations in isolation, but run different sets of tests to do so. PeaceKeeper profiles the JavaScript from sites such as YouTube and FaceBook.

WebXPRT, like the other XPRTs, uses scenarios that model the types of work people do with their devices.

It’s no surprise that the order changes depending on which aspect of the Web experience you emphasize, in much the same way that the most fuel-efficient cars might not be the ones with the best acceleration.

This is a bigger topic than we can deal with in a single blog post, and we’ll examine it more in the future.

Eric

What’s in a name?

A couple of weeks ago, the Notebookcheck German site published a review of the Huawei P8lite. We were pleased to see they used WebXPRT 2015, and the P8 Lite got an overall score of 47. This week, AnandTech published their review of the Huawei P8lite. In their review, the P8lite got an overall score of 59!

Those scores are very different, but it was not difficult to figure out why. The P8lite comes in two versions, depending on your market. The version Notebookcheck used is based on HiSilicon’s Kirin 620, while the version AnandTech used was Qualcomm’s Snapdragon 615 SoC. It’s also worth noting that the phone Notebookcheck tested was running Android 5.0, while the phone AnandTech tested was running Android 4.4. With different hardware and different operating systems, it’s no surprise that the results were different.

One consequence of the XPRTs being used across the world is that is that it is not uncommon to see results from devices in different markets. As we’ve said before, many things can influence benchmark results, so don’t assume that two devices with the same name are identical.

Kudos to both AnandTech and Notebookcheck for their care in presenting the system information for the devices in their reviews. The AnandTech review even included a brief description of the two models of the P8lite. This type of information is essential for helping people make informed decisions.

In other news, Windows 10 launched yesterday. We’re looking forward to seeing the TouchXPRT and WebXPRT results!

Eric

Seeing the whole picture

In past posts, we’ve discussed how people tend to focus on hardware differences when comparing performance or battery life scores between systems, but software factors such as OS version, choice of browser, and background activity often influence benchmark results on multiple levels.

For example, AnandTech recently published an article explaining how a decision by Google Chrome developers to increase Web page rendering times may have introduced a tradeoff between performance and battery life. To increase performance, Chrome asks Windows to use 1ms interrupt timings instead of the default 15.6ms timing. Unlike other applications that wait for the default timing, Chrome ends up getting its work done more often.

The tradeoff for that increased performance is that waking up the OS more frequently can diminish the effectiveness of a system’s innate power-saving attributes, such as a tick-less kernel and timer coalescing in Windows 8, or efficiency innovations in a new chip architecture. In this case, because of the OS-level interactions between Chrome and Windows, a faster browser could end up having a greater impact on battery life than might initially be suspected.

The article discusses the limitations of their test in detail, specifically with regards to Chrome 36 not being able to natively support the same HiDPI resolution as the other browsers, but the point we’re drawing out here is that accurate testing involves taking all relevant factors into consideration. People are used to the idea that changing browsers may impact Web performance, but not so much is said about a browser’s impact on battery life.

Justin

Comment on this post in the forums

It’s all in the presentation

The comment period for BatteryXPRT CP2 ended on Monday. Now we are in the final sprint to release the benchmark.

The extensive testing we’ve been doing has meant that we’ve been staring at a lot of numbers. This has led us to make a change in how we present the results. As you would expect, the battery life when you’re running the test using Wi-Fi is different than when you’re running it using your cellular network. Although individual devices vary, the difference is in the vicinity of 10 percent, about the same as the difference between Airplane mode and using Wi-Fi.

BatteryXPRT has always captured a device’s Wi-Fi setting in its disclosure results, but had not included this information with the results. Because we found it so helpful to have the Wi-Fi setting alongside the results, we have changed the presentation of the results to recognize three modes: Airplane, Wi-Fi, and Cellular. We hope that this will avoid confusion as people are using BatteryXPRT.

Note that we have not changed the way the results are calculated. Results you generated during the preview are still valid. However, results from one mode should not be compared to results from another mode.

We’ve been talking a lot about BatteryXPRT, but TouchXPRT is also looking great! We’re looking forward to releasing both of them soon!

Eric

Comment on this post in the forums

Staying out in the open

Back in July, Anandtech publicized some research about possible benchmark optimizations in the Galaxy S4. Yesterday, Anandtech published a much more comprehensive article, “The State of Cheating in Android Benchmarks.” It’s well worth the read.

Anandtech doesn’t accuse any of the benchmarks of being biased—it’s the OEMS who are supposedly doing the optimizations. I will note that none of the XPRT benchmarks are among the whitelisted CPU tests. That being said, I imagine that everyone in the benchmark game is concerned about any implication that their benchmark could be biased.

When I was a kid, my parents taught me that it’s a lot harder to cheat in the open. This is one of the reasons we believe so strongly in the community model for software development. The source code is available to anyone who joins the community. It’s impossible to hide any biases. At the same time, it allows us to control derivative works. That’s necessary to avoid biased versions of the benchmarks being published. We think the community model strikes the right balance.

However, any time there is a system, someone will try to game it. We’ll always be on the lookout for optimizations that happen outside the benchmarks.

Eric

Comment on this post in the forums

Check out the other XPRTs:

Forgot your password?