BenchmarkXPRT Blog banner

Category: What makes a good benchmark?

Looking for the next big thing

We recently introduced a new Web form in the members’ area to make it easier for you to submit new benchmark ideas. We’ve already received some interesting suggestions:

  • A benchmark to assess performance, battery life, and Chrome-specific technologies on Chromebooks
  • A benchmark to evaluate camera features and photo quality on phones and tablets
  • A benchmark for measuring the performance of cloud services
  • A benchmark for measuring the performance and battery life of iOS-based devices

Are you interesting in seeing any of these? Or do you have an idea no one has mentioned yet? We know there’s more out there! We like finding new things to measure and new ways to measure them, so please don’t hesitate to share your ideas!

Also, remember that the comment period for BatteryXPRT 2014 for Android CP2 ends on Monday, April 21. CP2 is the first XPRT to feature a Simplified Chinese UI. Please send in your comments. We’ll be aiming for a BatteryXPRT general release soon.

By the way, if you have a language you’d like to see and you’re willing to help with the translation, we’d love to talk to you!

Let us know what you think about potential new benchmarks, language options, or anything else on your mind at BenchmarkXPRTSupport@principledtechnologies.com.

Eric

Comment on this post in the forums

What is truth?

Last fall, we discussed AnandTech’s report on benchmark cheating, and why open and honest benchmark development is so important. This week, benchmark optimization is back in the news, as AnandTech says that the HTC One M8 boosts its performance in whitelisted benchmarks. CNET has quoted HTC admitting that they not only boosted performance, but promoted the boost as a feature.

However, HTC has gone a step further, giving users the option to manually set the phone to high-performance mode. Some of us at PT have been involved in developing benchmarks for over 20 years, and it’s always been true that one person’s cheat can be another person’s valid optimization. Whatever their motivation, HTC’s position – that some people will choose higher performance over longer battery life – is not necessarily wrong.

BatteryXPRT recognizes that there’s a tradeoff between performance and battery life, and that you shouldn’t penalize a fast system the same way you would a system that simply has poor battery life. That’s why it reports a performance score along with the estimated battery life.

Do you have thoughts on optimizations, cheating, or ways to make the benchmarks better? Please drop us a line at BenchmarkXPRTSupport@principledtechnologies.com.

Eric

Comment on this post in the forums

It’s always worth asking

Last week, one of our community members asked for a couple of enhancements to WebXPRT. They wanted WebXPRT to be easier to automate, and they made two specific requests:

  •  Add debug/result logs
  • Add the ability to start the test without UI interactions, by using a specific URL or a command line

This is a great example of why we put so much emphasis on the community. We have tried to make the BenchmarkXPRT benchmarks easy to use, but we don’t always face the same testing demands you do. If there’s anything we can do to make these tools more valuable, please let us know by posting on the forums or e-mailing us at benchmarkxprtsupport@principledtechnologies.com.

We are adding those abilities to the upcoming WebXPRT 2014 community preview. Speaking of the community preview, we have been working hard on it, and in the next few weeks, we’ll be talking about what will be in it.

Keep those requests coming!

Eric

Comment on this post in the forums

Sounds easy, but…

Sounds easy, but…

In Endurance, Bill said that we were going to be investigating battery life testing. He also discussed some of the issues that make battery testing difficult to do well. Finally, he explained why we were looking at MobileXPRT as the basis for the first version of the battery life test.

Over the last couple of months, we have been experimenting with a number of different approaches to battery testing. We now think that we have enough empirical data that we can make a proposal. We are working on that now. It should be available to community members in the next couple of weeks.

We hope you’ll look at the proposal and let us know what you think. Your input is an essential part of developing a really great test. If you’re not a member of the community, it’s easy to join.

In other news, we’re going to CES and would love to talk with you. If you’d like to chat, send an e-mail to benchmarkxprtsupport@principledtechnologies.com.

Eric

Comment on this post in the forums

There is such a thing as too much

There’s been a lot of excitement about TouchXPRT recently. However, we haven’t been ignoring HDXPRT. On November 9, we released a patch that lets HDXPRT support Windows 8. We’ve now integrated the patch into HDXPRT2012, so all copies of HDXPRT 2012 going forward will install on Windows 8 without the need for a separate step.

As promised, we will be releasing the source code for HDXPRT 2012. We anticipate having it available for community members by December 14.

During the comment period for HDXPRT, this message came through loud and clear: HDXPRT 2012 is too big and takes too long to run. So we are working hard to find the best way to reduce the number of applications and scenarios. While we want to make the benchmark smaller and faster, we want to make sure that HDXPRT 2013 is comprehensive enough to provide useful performance metrics for the greatest number of people.

We’re working toward having an RFC in late January that will define a leaner, meaner HDXPRT 2013, and will reflect the other comments we have as received as well.  If you have thoughts about which applications and scenarios are most important to you, please let us know.

In other news, CES is coming in January, and Principled Technologies will be there! Once again, Bill is hoping to meet with as many of you in the Development Community as possible. We’ll have a suite at the Hilton and would love for you to come, kick back, and talk about HDXPRT, TouchXPRT, the future of benchmarks, or about the cool things you’ve seen at the show. (Bill loves talking about gadgets. Last year, he went into gadget overload!)

If you plan to be at CES, but are stuck working a booth or suite, let us know and Bill will try to stop by and say hi. Drop us an email at hdxrpt_CES@principledtechnologies.com and we will set up an appointment.

Finally, we’re really excited about the big changes at the Principled Technologies Web site. The new Web site gives us a lot of opportunities. Over the next few weeks, we’ll be looking at ways the Development Community can take advantage of them.

Eric

on this post in the forums

The real art of benchmarking

In my last blog entry, I noted the challenge of balancing real-world and real-science considerations when benchmarking Web page loads. That issue, however, is inherent in all benchmarking. Real world argues for benchmarks that emphasize what users and computers actually do. For servers, that might mean something like executing real database transactions against a real database from real client computers. For tablets, that might mean real fingers selecting and displaying real photos. There are obvious issues with both—setting up such a real database environment is difficult and who wants to be the owner of the real fingers driving the tablet? It is also difficult to understand what causes performance differences—is it the network, the processors, or the disks in the server? There are also more subtle challenges, such as how to make the tests work on servers or tablets other than the original ones. Worse, such real-world environments are subject to all sorts of repeatability and reproducibility issues.

Real science, on the other hand, argues for benchmarks that emphasize repeatable and reproducible results. Further, real science wants benchmarks that isolate the causes of performance differences. For servers, that might mean a suite of tests targeting processor speed, network bandwidth, and disk transfer rate. For tablets, that might mean tests targeting processor speed, touch responsiveness, and graphics-rendering rate. The problem is that it is not always obvious what combination of such factors actually delivers better database server performance or tablet experience. Worse, it is possible that testing different databases and transactions would result in very different characteristics that these tests don’t at all measure.

The good news is that real world and real science are not always in opposition. The bad news is that a third factor exacerbates the situation—benchmarks take real time (and of course real money) to develop. That means benchmark developers need to make compromises if they want to bring tests to market before the real world they are attempting to measure has changed. And, they need to avoid some of the most difficult technical hurdles. Like most things, that means trying to find the right balance between real world and real science.

Unfortunately, there is no formula for determining that balance. Instead, it really is somewhat of an art. I’d love to hear from you some examples of benchmarks (current or from the past) that you think do a good job implementing this balance and showing the real art of benchmarking.

Bill

Comment on this post in the forums

Check out the other XPRTs:

Forgot your password?