BenchmarkXPRT Blog banner

Category: What makes a good benchmark?

Digging deeper

From time to time, we like to revisit the fundamentals of the XPRT approach to benchmark development. Today, we’re discussing the need for testers and benchmark developers to consider the multiple factors that influence benchmark results. For every device we test, all of its hardware and software components have the potential to affect performance, and changing the configuration of those components can significantly change results.

For example, we frequently see significant performance differences between different browsers on the same system. In our recent recap of the XPRT Weekly Tech Spotlight’s first year, we highlighted an example of how testing the same device with the same benchmark can produce different results, depending on the software stack under test. In that instance, the Alienware Steam Machine entry included a WebXPRT 2015 score for each of the two browsers that consumers were likely to use. The first score (356) represented the SteamOS browser app in the SteamOS environment, and the second (441) represented the Iceweasel browser (a Firefox variant) in the Linux-based desktop environment. Including only the first score would have given readers an incomplete picture of the Steam Machine’s web-browsing capabilities, so we thought it was important to include both.

We also see performance differences between different versions of the same browser, a fact especially relevant to those who use frequently updated browsers, such as Chrome. Even benchmarks that measure the same general area of performance, for example, web browsing, are usually testing very different things.

OS updates can also have an impact on performance. Consumers might base a purchase on performance or battery life scores and end up with a device that behaves much differently when updated to a new version of Android or iOS, for example.

Other important factors in the software stack include pre-installed software, commonly referred to as bloatware, and the proliferation of apps that sap performance and battery life.

This is a much larger topic than we can cover in the blog. Let the examples we’ve mentioned remind you to think critically about, and dig deeper into, benchmark results. If we see published XPRT scores that differ significantly from our own results, our first question is always “What’s different between the two devices?” Most of the time, the answer becomes clear as we compare hardware and software from top to bottom.

Justin

A new reality

A while back, I wrote about a VR demo built by students from North Carolina State University. We’ve been checking it out over the last couple of months and are very impressed. This workload will definitely heat up your device! While the initial results look promising, this is still an experimental workload and it’s too early to use results in formal reviews or product comparisons.

We’ve created a page that tells all about the VR demo. As an experimental workload, the demo is available only to community members. As always, members can download the source as well as the APK.

We asked the students to try to build the workload for iOS as a stretch goal. They successfully built an iOS version, but this was at the end of the semester and there was little time for testing. If you want to experiment with iOS yourself, look at the build instructions for Android and iOS that we include with the source. Note that you will need Xcode to build and deploy the demo on iOS.

After you’ve checked out the workload, let us know what you think!

Finally, we have a new video featuring the VR demo. Enjoy!

vr-demo-video

Eric

Experience is the best teacher

One of the core principles that guides the design of the XPRT tools is they should reflect the way real-world users use their devices. The XPRTs try to use applications and workloads that reflect what users do and the way that real applications function. How did we learn how important this is? The hard way—by making mistakes! Here’s one example.

In the 1990s, I was Director of Testing for the Ziff-Davis Benchmark Operation (ZDBOp). The benchmarks ZDBOp created for its technical magazines became the industry standards, because of both their quality and Ziff-Davis’ leadership in the technical trade press.

WebBench, one of the benchmarks ZDBOp developed, measured the performance of early web servers. We worked hard to create a tool that used physical clients and tested web server performance over an actual network. However, we didn’t pay enough attention to how clients actually interacted with the servers. In the first version of WebBench, the clients opened connections to the server, did a small amount of work, closed the connections, and then opened new ones.

When we met with vendors after the release of WebBench, they begged us to change the model. At that time, browsers opened relatively long-lived connections and did lots of work before closing them. Our model was almost the opposite of that. It put vendors in the position of having to choose between coding to give their users good performance and coding to get good WebBench results.

Of course, we were horrified by this, and worked hard to make the next version of the benchmark reflect more closely the way real browsers interacted with web servers. Subsequent versions of WebBench were much better received.

This is one of the roots from which the XPRT philosophy grew. We have tried to learn and grow from the mistakes we’ve made. We’d love to hear about any of your experiences with performance tools so we can all learn together.

Eric

Creating a machine-learning benchmark

Recently, we wrote about one of the most exciting emerging technology areas, machine learning, and the question of what role the XPRTs could play in the field.

Experts expect machine learning to be the analytics backbone of the IoT data explosion. It is a disruptive technology with potential to influence a broad range of industries. Consumer and industrial applications that take advantage of machine-learning advancements in computer vision, natural language processing, and data analytics are already available and many more are on the way.

Currently, there is no comprehensive machine-learning or deep-learning benchmark that includes home, automotive, industrial, and retail use cases. The challenge with developing a benchmark for machine learning is that these are still the early days of the technology. A fragmented software and hardware landscape and lack of standardized implementations makes benchmarking machine learning complex and challenging.

Based on the conversations we’ve had over the last few weeks, we’ve decided to take on that challenge. With the community’s help, of course!

As we outlined in a blog entry last month, we will work with interested folks in the community, key vendors, and academia to pull together what we are internally calling MLXPRT.

While the result may differ substantially from the existing XPRTs, we think the need for something is great. Whether that will turn out to be a packaged tool or just sample code and workloads remains to be seen.

What we need most your help. We need both general input about what you would like to see as well as any expertise you may have. Let us know any questions you may have or ways you can help.

On a related note, I’ll be at CES 2017 in Las Vegas during the first week of January. I’d love to meet and talk more about machine learning, benchmarking, or the XPRTs. If you’re planning to be there and would like to connect, let us know.

We will not have a blog entry next week over the holidays, so we wish all of you a wonderful time with your families and a great start to the new year.

Bill

HDXPRT’s future

While industry pundits have written many words about the death of the PC, Windows PCs are going through a renaissance. No longer do you just choose between a desktop or a laptop in beige or black. There has been an explosion of choices.

Whether you want a super-thin notebook, a tablet, or a two-in-one device, the market has something to offer. Desktop systems can be small devices on your desk, all-in-ones with the PC built into the monitor, or old-style boxes that sit on the floor. You can go with something inexpensive that will be sufficient for many tasks or invest in a super-powerful PC capable of driving today’s latest VR devices. Or you can get a new Microsoft Surface Studio, an example of the new types of devices entering the PC scene.

The current proliferation of PC choices means that tools that help buyers understand the performance differences between systems are more important than they have been in years. Because HDXPRT is one such tool, we expect demand for it to increase.

We have many tasks ahead of us as we prepare for this increased demand. The first is to release a version of HDXPRT 2014 that doesn’t require a patch. We are working on that and should have something ready later this month.

For the other tasks, we need your input. We believe we need to update HDXPRT to reflect the world of high-definition content. It’s tempting to simply change the name to UHDXPRT, but this was our first XPRT and I’m partial to the original name. How about you?

As far as tests, what should a 2017 version of HDXPRT include? We think 4K-related workloads are a must, but aren’t sure whether 4K playback tests are the way to go. What do you think? We need to update other content, such as photo and video resolutions, and replace outdated applications with current versions. Would a VR test would be worthwhile?

Please share your thoughts with us over the coming weeks as we put together a plan for the next version of HDXPRT!

Bill

An exciting milestone for WebXPRT!

If you’re familiar with the run counter on WebXPRT.com, you may have noticed that WebXPRT recently passed a pretty significant milestone. Since we released WebXPRT 2013, users running WebXPRT 2013 and 2015 have successfully completed over 100,000 runs!

We’re thrilled about WebXPRT’s ongoing popularity, and we think that it’s due to the benchmark’s unique combination of characteristics: it’s easy to run, it runs quickly and on a wide variety of platforms, and it evaluates device performance using real-world tasks. Manufacturers, developers, consumers, and media outlets in more than 358 cities, from Aberdeen to Zevenaar, and 57 countries, from Argentina to Vietnam, have used WebXPRT’s easy-to-understand results to compare how well devices handle everyday tasks. WebXPRT has definitely earned its reputation as a “go-to” benchmark.

If you haven’t run WebXPRT yet, give it a try. The test is free and runs in almost any browser.

We’re grateful for everyone who’s helped us reach this milestone. Here’s to another 100,000 runs!

Justin

Check out the other XPRTs:

Forgot your password?