BenchmarkXPRT Blog banner

Category: Performance benchmarking

What is truth?

Last fall, we discussed AnandTech’s report on benchmark cheating, and why open and honest benchmark development is so important. This week, benchmark optimization is back in the news, as AnandTech says that the HTC One M8 boosts its performance in whitelisted benchmarks. CNET has quoted HTC admitting that they not only boosted performance, but promoted the boost as a feature.

However, HTC has gone a step further, giving users the option to manually set the phone to high-performance mode. Some of us at PT have been involved in developing benchmarks for over 20 years, and it’s always been true that one person’s cheat can be another person’s valid optimization. Whatever their motivation, HTC’s position – that some people will choose higher performance over longer battery life – is not necessarily wrong.

BatteryXPRT recognizes that there’s a tradeoff between performance and battery life, and that you shouldn’t penalize a fast system the same way you would a system that simply has poor battery life. That’s why it reports a performance score along with the estimated battery life.

Do you have thoughts on optimizations, cheating, or ways to make the benchmarks better? Please drop us a line at BenchmarkXPRTSupport@principledtechnologies.com.

Eric

Comment on this post in the forums

Golden tickets

We (Bill and Mark) are on our way home from CES. There were lots of cool things to see, from electric cars to health and fitness wearables to all manner of mobile devices. And, more, a whole lot more.

We enjoyed seeing many of those products, but that was not our primary mission at the show. Our main goal was to spread the word about the XPRT benchmarks. We did that by visiting multiple mobile-device members and giving out to many of them a very special golden ticket. Yes, we’re talking a physical, Willy Wonka-style golden ticket. The two-sided tickets look really cool.

One side invites folks to be heard by joining the BenchmarkXPRT community. The other offers them the opportunity to have PT test devices for free with all the applicable XPRT benchmarks. All a vendor has to do to get this free testing is send the device to PT. We hope to get many devices in-house and to provide a great many results on our Web sites.

We wore the new BenchmarkXPRT shirts as we walked the floor.

We will soon be sending one shirt—and one golden ticket—to each member of the community. Please make sure we have your latest mailing address so we can ship those to you.

-Bill & Mark

Comment on this post in the forums

Looking for a winner

This week, PT published its first two public reports using HDXPRT 2012: Performance comparison: Dell Latitude E5430 vs. HP ProBook 4440s and Performance comparison: Dell Latitude E5430 vs. Lenovo ThinkPad L430. You should check them out.

Of course, you can find the HDXPRT results from these reports in the HDXPRT 2012 results database along with results from the characterization study we did last month. The results database is a repository of HDXPRT results you can use to compare system performance. The database includes full disclosure information and lets you sort by a number of criteria, including any HDXPRT score, the processor, amount of RAM, graphic card, and so on.

Looking at the results in the database got me wondering who has the mightiest machine out there. The current winner is a custom-built system with an Intel Core i7 3770 and 8 GB of RAM. It has an HDXPRT 2012 Create HD score of 248.

Records are meant to be broken, and I know someone out there can grind that score to dust.  So, we’re going to have a contest. The first person to submit a set of HDXPRT results with a score above 248 will win at least bragging rights and maybe a prize if we can find something suitable around our offices.

You’ll find instructions for submitting results at Submit your HDXPRT 2012 results.

I can’t wait to see your results!

Eric

Comment on this post in the forums

Benchmarking a benchmark

One of the challenges of any benchmark is understanding its characteristics. The goal of a benchmark is to measure performance under a defined set of circumstances. For system-level, application-oriented benchmarks, it isn’t always obvious how individual components in the system influence the overall score. For instance, how does doubling the amount of memory affect the benchmark score? The best way to understand the characteristics of a benchmark is to run a series of carefully controlled experiments that change one variable at a time. To test the benchmark’s behavior with increased memory, you would take a system and run the benchmark with different amounts of RAM. Changing the processor, graphics subsystem, or hard disk lets you see the influence of those components. Some components, like memory, can change in both their amount and speed.

The full matrix of system components to test can quickly grow very large. While the goal is to change only one component at a time, this is not always possible. For example, you can’t change the processor from an Intel to an AMD without also changing the motherboard.

We are in the process of putting HDXPRT 2011 through a series of such tests. HDXPRT 2011 is a system-level, application-oriented benchmark for measuring the performance of PCs on consumer-oriented HD media scenarios. We want to understand, and share with you, how different components influence HDXPRT scores. We expect to release a report on our findings next week. It will include results detailing the effect of processor speed, amount of RAM, hard disk type, and graphics subsystem.

There is a tradeoff between the size of the matrix and how long it takes to produce the results. We’ve tried to choose the areas we felt were most important, but we’d like to hear what you consider important. So, what characteristics of HDXPRT 2011 would you like to see us test?

Bill

Comment on this post in the forums

Putting HDXPRT in some benchmark context

Benchmarks come in many shapes and sizes.  Some are extremely small, simple, and focused, while others are large, complex, and cover many aspects of a system.  To help position HDXPRT in the world of benchmarks, let me share with you a little taxonomy that Bill and I have long used.  No taxonomy is perfect, of course, but we’ve found this one to be very helpful as a general categorization tool.

From the perspective of how benchmarks measure performance, you can divide most of them into three groups.

Inspection tools use highly specialized tests to target very particular parts of a system. Back in the day, lo these many decades ago—okay, it was only two decades, but in dog years two tech decades is like five generations—some groups used a simple no-op loop to measure processor performance. I know, it sounds dumb today, but for a short time many felt it was a legitimate measure of processor clock speed, which is one aspect of performance. Similarly, if you want to know how fast a graphics subsystem could draw a particular kind of line, you could write code to draw lines of that type over and over.

These tools have very limited utility, because they don’t do what real users do, but for people working close to hardware, they can be useful.

Moving closer to the real world, synthetic benchmarks are specially written programs that simulate the kinds of work their developers believe real users are doing. So, if you think your target users are spending all day in email, you could write your own mini email client and time functions in it.  These tools definitely move closer to real user work than inspection tools, but they still have the drawback of not actually running the programs real people are using.

Application-based benchmarks take that last step by using real applications, the same programs that users employ in the real world. These benchmarks cause those applications to perform the kinds of actions that real users take, and they time those actions.  You can always argue about how representative they are—more on that in a future blog entry, assuming I don’t forget to write it—but they are definitely closer to the real world because they’re using real applications.

With all of that background, HDXPRT becomes easy to classify:  it’s an application-based benchmark.

Mark Van Name

Comment on this post in the forums

Our community’s goal

Computer system performance evaluation has a long and complex history. Many of the earliest tests were simple, short code snippets, such as Whetstone, that did little more than give an indication of how fast a particular computer subsystem was able to operate. Unfortunately, such simple benchmarks quickly lost their value, in part because they were very crude measures, and in part because software tools on the things they were measuring could easily optimize for them. In some cases, a compiler could even recognize a test and “optimize” the code by simply producing the final result!

Over time, though, benchmarks have become more complex and more relevant. Whole organizations exist and have existed to build benchmarks. Notable ones include the Ziff-Davis Benchmark Operation (ZDBOp), which the Ziff-Davis computer magazines funded in the 1990s and which Mark and I ran; the Standard Performance Evaluation Corporation (SPEC), which its member companies fund and of which PT is a member; and the Business Applications Performance Corporation (BAPCo), which its member companies fund. Each of these organizations has developed widely used products, such as Winstone (ZDBOp), SPEC CPU (SPEC), and SYSmark (BAPCo). Each organization has also always faced challenges. In the case of ZDBOp, for example, Ziff Davis could no longer support the costs of developing its benchmarks, so they discontinued the group. SPEC continues to develop good benchmarks, but its process can sometimes yield years between versions.

The goal with HDXPRT and the HDXPRT Development Community (HDC) is to explore a new way to develop benchmarks. By utilizing the expertise and experience of a community of interested people, we hope to be able develop benchmarks in an open and collaborative environment while keeping them timely.

HDXPRT 2011 is the first test of this approach. We believe that it and subsequent versions of it, as well as other benchmarks, will give the industry a new model for creating world-class performance measurement tools.

If you’re not a member of the HDC, please consider joining us and helping define the future of performance evaluation.

Bill

Comment on this post in the forums

Check out the other XPRTs:

Forgot your password?