BenchmarkXPRT Blog banner

Category: What makes a good benchmark?

A clarification from Brett Howse

A couple of weeks ago, I described a conversation I had with Brett Howse of AnandTech. Brett was kind enough to send a clarification of some of his remarks, which he gave us permission to share with you.

“We are at a point in time where the technology that’s been called mobile since its inception is now at a point where it makes sense to compare it to the PC. However we struggle with the comparisons because the tools used to do the testing do not always perform the same workloads. This can be a major issue when a company uses a mobile workload, and a desktop workload, but then puts the resulting scores side by side, which can lead to misinformed conclusions. This is not only a CPU issue either, since on the graphics side we have OpenGL well established, along with DirectX, in the PC space, but our mobile workloads tend to rely on OpenGL ES, with less precision asked of the GPU, and GPUs designed around this. Getting two devices to run the same work is a major challenge, but one that has people asking what the results would be.”

I really appreciate Brett taking the time to respond. What are your thoughts in these issues? Please let us know!

Eric

Comparing apples and oranges?

My first day at CES, I had breakfast with Brett Howse from AnandTech. It was a great opportunity to get the perspective of a savvy tech journalist and frequent user of the XPRTs.

During our conversation, Brett raised concerns about comparing mobile devices to PCs. As mobile devices get more powerful, the performance and capability gaps between them and PCs are narrowing. That makes it more common to compare upper-end mobile devices to PCs.

People have long used different versions of benchmarks when comparing these two classes of devices. For example, the images for benchmarking a phone might be smaller than those for benchmarking a PC. Also, because of processor differences, the benchmarks might be built differently, say a 16- or 32-bit executable for a mobile device, and a 64-bit version for a PC. That was fine when no one was comparing the devices directly, but can be a problem now.

This issue is more complicated than it sounds. For those cases where a benchmark uses a dumbed-down version of the workload for mobile devices, comparing the results is clearly not valid. However, let’s assume that the workload stays the same, and that you run a 32-bit benchmark on a tablet, and a 64-bit version on a PC. Is the comparison valid? It may be, if you are talking about the day-to-day performance a user is likely to encounter. However, it may not be valid if you are making statement about the potential performance of the device itself.

Brett would like the benchmarking community to take charge of this issue and provide guidance about how to compare mobile devices and PCs. What are your thoughts?

Eric

The XPRT Women Code-a-Thon

As Justin explained last week, we’ve resolved the issue we found with the TouchXPRT CP. I’m happy to say that the testing went well and that we released CP3 this week.

It’s been only three weeks since we announced the XPRT Weekly Tech Spotlight, and we already have another big announcement! Principled Technologies has joined with ChickTech Seattle to host the first ever XPRT Women Code-a-Thon! In this two-day event, participants will compete to create the best new candidate workload for WebXPRT or MobileXPRT. The workloads can’t duplicate existing workloads, so we are looking forward to seeing the new ideas.

Judges will study all the workloads and award prizes to the top three: $2,500 for first place, $1,500 for second place, and $1,000 for third place. Anyone interested can register here.

PT and the BenchmarkXPRT Development Community are committed to promoting the advancement of women in STEM, but we also win by doing good. As with the NCSU senior project, the BenchmarkXPRT Development Community will get some fresh perspectives and some new experimental test tools. Everyone wins!

So much has happened in 2016 and January isn’t even over yet. The year is off to a great start!

Eric

In the spotlight

I’m happy to be back in North Carolina, but I had a really great time at CES. I talked to over a dozen companies about the XPRTs and the XPRT Weekly Tech Spotlight, and had some good conversations. Hopefully, some of these companies’ devices will be among the first ones we showcase when the XPRT Weekly Tech Spotlight goes live next month.

Of course, I saw some really great tech at CES! Amazing TVs and cars, magic mirrors, all kinds of drones, and the list goes on. Before the show, the Internet of Things was predicted to be big this year, and boy, was it! Smart refrigerators, door locks, and thermostats were just the beginning. Some of my favorite examples were the chopsticks and footbath—both Bluetooth enabled—and “the world’s first remote controlled game shoe.”

Clearly IoT is the Wild West of technology right now. We’re had some conversations about how the XPRTs might be able to help consumers navigate the chaos. However, with a class of products this diverse, there are a lot of issues to consider. If you have any thoughts about this, let us know!

Eric

Auf Deutsche

Early next week, we will update WebXPRT by adding a German UI. This brings the number of available languages to three. WebXPRT has had a Simplified Chinese UI for a while, but you had to click a link on the WebXPRT page to get it. The new version removes that limitation, and lets you select Simplified Chinese, English, or German from the UI.

WebXPRT '15 German

We’re working on getting WebXPRT to automatically detect the language of your device, but for now, the UI defaults to English.

We would like to expand the range of languages the XPRTs support over time. This is an area where you can help. If you’d like to see your language represented and are willing to help with translation, please let us know.

I know it’s the holiday season, but remember that CES will be here before we know it. I’m really looking forward to seeing the show, and I may have some big news to talk about while I’m there! If you’re planning to be at CES, send a message and let’s find a time to meet!

We will not have a blog post next week. Happy holidays!

Eric

Last week in the XPRTs
We published the December 2015 BenchmarkXPRT Development Community newsletter.
We added one new BatteryXPRT ’14 result.
We added nine new MobileXPRT ’13 results.
We added one new MobileXPRT ’15 result.
We added four new WebXPRT ’15 results.

Question we get a lot

“How come your benchmark ranks devices differently than [insert other benchmark here]?” It’s a fair question, and the reason is that each benchmark has its own emphasis and tests different things. When you think about it, it would be unusual if all benchmarks did agree.

To illustrate the phenomenon, consider this excerpt from a recent browser shootout in VentureBeat:

 
While this looks very confusing, the simple explanation is that the different benchmarks are testing different things. To begin with, SunSpider, Octane, JetStream, PeaceKeeper, and Kraken all measure JavaScript performance. Oort Online measures WebGL performance. WebXPRT measures both JavaScript and HTML 5 performance. HTML5Test measures HTML5 compliance.

Even with benchmarks that test the same aspect of browser performance, the tests differ. Kraken and SunSpider both test the speed of JavaScript math, string, and graphics operations in isolation, but run different sets of tests to do so. PeaceKeeper profiles the JavaScript from sites such as YouTube and FaceBook.

WebXPRT, like the other XPRTs, uses scenarios that model the types of work people do with their devices.

It’s no surprise that the order changes depending on which aspect of the Web experience you emphasize, in much the same way that the most fuel-efficient cars might not be the ones with the best acceleration.

This is a bigger topic than we can deal with in a single blog post, and we’ll examine it more in the future.

Eric

Check out the other XPRTs:

Forgot your password?