BenchmarkXPRT Blog banner

Category: What makes a good benchmark?

Auf Deutsche

Early next week, we will update WebXPRT by adding a German UI. This brings the number of available languages to three. WebXPRT has had a Simplified Chinese UI for a while, but you had to click a link on the WebXPRT page to get it. The new version removes that limitation, and lets you select Simplified Chinese, English, or German from the UI.

WebXPRT '15 German

We’re working on getting WebXPRT to automatically detect the language of your device, but for now, the UI defaults to English.

We would like to expand the range of languages the XPRTs support over time. This is an area where you can help. If you’d like to see your language represented and are willing to help with translation, please let us know.

I know it’s the holiday season, but remember that CES will be here before we know it. I’m really looking forward to seeing the show, and I may have some big news to talk about while I’m there! If you’re planning to be at CES, send a message and let’s find a time to meet!

We will not have a blog post next week. Happy holidays!

Eric

Last week in the XPRTs
We published the December 2015 BenchmarkXPRT Development Community newsletter.
We added one new BatteryXPRT ’14 result.
We added nine new MobileXPRT ’13 results.
We added one new MobileXPRT ’15 result.
We added four new WebXPRT ’15 results.

Question we get a lot

“How come your benchmark ranks devices differently than [insert other benchmark here]?” It’s a fair question, and the reason is that each benchmark has its own emphasis and tests different things. When you think about it, it would be unusual if all benchmarks did agree.

To illustrate the phenomenon, consider this excerpt from a recent browser shootout in VentureBeat:

 
While this looks very confusing, the simple explanation is that the different benchmarks are testing different things. To begin with, SunSpider, Octane, JetStream, PeaceKeeper, and Kraken all measure JavaScript performance. Oort Online measures WebGL performance. WebXPRT measures both JavaScript and HTML 5 performance. HTML5Test measures HTML5 compliance.

Even with benchmarks that test the same aspect of browser performance, the tests differ. Kraken and SunSpider both test the speed of JavaScript math, string, and graphics operations in isolation, but run different sets of tests to do so. PeaceKeeper profiles the JavaScript from sites such as YouTube and FaceBook.

WebXPRT, like the other XPRTs, uses scenarios that model the types of work people do with their devices.

It’s no surprise that the order changes depending on which aspect of the Web experience you emphasize, in much the same way that the most fuel-efficient cars might not be the ones with the best acceleration.

This is a bigger topic than we can deal with in a single blog post, and we’ll examine it more in the future.

Eric

Upping our game

As we wrote last week, we’re releasing MobileXPRT 2015 to the public tomorrow. Thanks to everyone who helped make the community preview a success!

We’re working on the TouchXPRT 2016 design document and will make it available for the community to review soon.

As you know, we’re always investigating initiatives that could improve our game. We’re continuing to investigate creating experimental tests for future XPRTs. Experimental tests will allow us to maintain broad compatibility for each XPRT tool while providing testers with an opportunity to evaluate cutting-edge technologies.

Another initiative involves looking for new partnerships with people who are not yet part of the community, but could add valuable input to the development process. It’s too soon to say much more about this, but we’re having fruitful conversations and hoping that these partnerships will grow the community even more!

If you have ideas about experimental tests, improving the XPRTs, or expanding the community, please let us know.

Eric

Explaining the BenchmarkXPRT Development Community

Over the last year, I’ve spoken about the XPRT benchmarks with people across America, in China (at IDF Shenzhen), and in Europe (at Mobile World Congress). I regularly found myself having to explain how the BenchmarkXPRT Development Community works. While I was glad to do so, I found myself wishing that the many people I wasn’t able to talk with could also learn about how the community works.

To help make that happen, we’ve developed a simple and engaging video. My (admittedly prejudiced) opinion is that it does a great job of explaining how the community works in less than two minutes. That’s a lot faster than I was able to explain it to folks!

We hope you enjoy the video. And we hope you’ll pass it along to other folks who aren’t already part of the community so they can learn how it works and hopefully be persuaded to join us. Thanks!

Bill

Mystery solved

As we mentioned a few weeks ago, the WebXPRT Local Notes test would not complete on recent builds of Windows 10 when using the Edge browser. Other browsers complete WebXPRT in recent Windows 10 builds without any problems.

We now know what is causing this behavior. The Local Notes test stores encrypted content in LocalStorage as UTF-16 character encoded Unicode strings. The encrypted content included values that are not considered valid characters in certain use cases.  The current Edge implementation treats these characters as undefined and cannot store them. Other browsers may not have had an issue with the characters because of differences in the way they implement LocalStorage.

We’ve been able to work around this by using escape sequences for unsupported Unicode code points.  Testing so far has not shown any perceptible change in results, so we believe that we will able to make this change to WebXPRT without compromising the comparability of the results.

Because this issue affects both WebXPRT 2013 and WebXPRT 2015, we’re planning to update both versions. We’ll let you know as soon as they are available.

If you’d like more details about this issue and the fix, please let us know.

Eric

It’s not the same

We sometimes get questions about comparing results from older versions of benchmarks to the current version. Unfortunately, it’s never safe to compare the results from different versions of benchmarks. This principle has been around much longer than the XPRTs. A major update will use different workloads and test data, and will probably be built with updated or different tools.

To avoid confusion, we rescale the results every time we release a new version of an existing benchmark. By making the results significantly different, we hope to reduce the likelihood that results from two different versions will get mixed together.

As an example, we scaled the results from WebXPRT 2015 to be significantly lower than those from WebXPRT 2013. Here are some scores from the published results for WebXPRT 2013 and WebXPRT 2015.

WebXPRT 2013 vs. 2015 results

Please note that the results above are not necessarily from the same device configurations, and are meant only to illustrate the difference in results between the two versions of WebXPRT.

If you have any questions, please let us know.

Eric

Check out the other XPRTs:

Forgot your password?