BenchmarkXPRT Blog banner

Category: What makes a good benchmark?

Planning the next version of HDXPRT

A few weeks ago, we wrote about the capabilities and benefits of HDXPRT. This week, we want to share some initial ideas for the next version of HDXPRT, and invite you to send us any comments or suggestions you may have.

The first step towards a new HDXPRT will be updating the benchmark’s workloads to increase their value in the years to come. Primarily, this will involve updating application content, such as photos and videos, to more contemporary file resolutions and sizes. We think 4K-related workloads will increase the benchmark’s relevance, but aren’t sure whether 4K playback tests are necessary. What do you think?

The next step will be to update versions of the real-world trial applications included in the benchmark, including Adobe Photoshop Elements, Apple iTunes, Audacity, CyberLink MediaEspresso, and HandBrake. Are there other any applications you feel would be a good addition to HDXPRT’s editing photos, editing music, or converting videos test scenarios?

We’re also planning to update the UI to improve the look and feel of the benchmark and simplify navigation and functionality.

Last but not least, we’ll work to fix known problems, such as the hardware acceleration settings issue in MediaEspresso, and eliminate the need for workarounds when running HDXPRT on the Windows 10 Creators Update.

Do you have feedback on these ideas or suggestions for applications or test scenarios that we should consider for HDXPRT? Are there existing features we should remove? Are there elements of the UI that you find especially useful or would like to see improved? Please let us know. We want to hear from you and make sure that HDXPRT continues to meet your needs.

Justin

Apples and pears vs. oranges and bananas

When people talk about comparing disparate things, they often say that you’re comparing apples and oranges. However, sometimes that expression doesn’t begin to describe the situation.

Recently, Justin wrote about using CrXPRT on systems running Neverware CloudReady OS. In that post, he noted that we couldn’t guarantee that using CrXPRT on CloudReady and Chrome OS systems would be a fair comparison. Not surprisingly, that prompted the question “Why not?”

Here’s the thing: It’s a fair comparison of those software stacks running on those hardware configurations. If everyone accepted that and stopped there, all would be good. However, almost inevitably, people will read more into the scores than is appropriate.

In such a comparison, we’re changing multiple variables at once. We’ve written before about the effect of the software stack on performance. CloudReady and Chrome OS are two different implementations of the Chromium OS, and it’s possible that one is more efficient than the other. If so, that would affect CrXPRT scores. At the same time, the raw performance of the two hardware configurations under test could also differ to a certain degree, which would also affect CrXPRT scores.

Here’s a metaphor: If you measure the effective force at the end of two levers and find a difference, to what do you attribute that difference? If you know the levers are the same length, you can attribute the difference to the amount of applied force. If you know the applied force is identical, you can attribute the difference to the length of the levers. If you lack both of those data points, you can’t know whether the difference is due to the length, the force, or a combination of the two.

With a benchmark, you can run multiple experiments designed to isolate variables and use the results from those experiments to look for trends. For example, we could install both CloudReady OS and Chrome OS on the same Intel-based Chromebook and compare the CrXPRT results. Because that removes hardware differences as a variable, such an experiment would offer some insight into how the two implementations compare. However, because differences in hardware can affect the performance of a given piece of software, this single data point would be of limited value. We could repeat the experiment on a variety of other Intel-based Chromebooks, and other patterns might emerge. If one of the implementations consistently scored higher, that would suggest that it was more efficient than the other, but would still not be definitively conclusive.

I hope this gives you some idea about why we are cautious about drawing conclusions when comparing results from different sets of hardware running different software stacks.

Eric

BatteryXPRT: A quick and reliable way to estimate Android battery life

In the last few weeks, we reintroduced readers to the capabilities and benefits of TouchXPRT and CrXPRT. This week, we’d like to reintroduce BatteryXPRT 2014 for Android, an app that evaluates the battery life and performance of Android devices.

When purchasing a phone or tablet, it’s good to know how long the battery will last on a typical day and how often you’ll need to charge it. Before BatteryXPRT, you had to rely on a manufacturer’s estimate or full rundown tests that perform tasks that don’t resemble the types of things we do with our phones and tablets every day.

We developed BatteryXPRT to estimate battery life reliably in just over five hours, so testers can complete a full evaluation in one work day or while sleeping. You can configure it to run while the device is connected to a network or in Airplane mode. The test also produces a performance score by running workloads that represent common everyday tasks.

BatteryXPRT is easy to install and run, and is a great resource for anyone who wants to evaluate how well an Android device will meet their needs. If you’d like to see test results from a variety of Android devices, go to BatteryXPRT.com and click View Results, where you’ll find scores from many different Android devices.

If you’d like to run BatteryXPRT:

Simply download BatteryXPRT from the Google Play store or BatteryXPRT.com. The BatteryXPRT installation instructions and user manual provide step-by-step instructions for how to configure your device and kick off a test. We designed BatteryXPRT 2014 for Android to be compatible with a wide variety of Android devices, but because there are so many devices on the market, it is inevitable that users occasionally run into problems. In the Tips, tricks, and known issues document, we provide troubleshooting suggestions for issues we encountered during development testing.

If you’d like to learn more:

We offer a full online BatteryXPRT training course that covers almost every aspect of the benchmark. You can view the sections in order or jump to the parts that interest you. We guarantee that you’ll learn something new!

BatteryXPRT 2014 for Android Training Course

If you’d like to dig into the details:

Check out the Exploring BatteryXPRT 2014 for Android white paper. In it, we discuss the app’s development and structure. We also describe the component tests; explain the differences between the test’s Airplane, Wi-Fi, and Cellular modes; and detail the statistical processes we use to calculate expected battery life.

If you’d like to dig even deeper, the BatteryXPRT source code is available to members of the BenchmarkXPRT Development Community, so consider joining today. Membership is free for members of any company or organization with an interest in benchmarks, and there are no obligations after joining.

If you haven’t used BatteryXPRT before, try it out and let us know what you think!

Justin

Learning about machine learning

Everywhere we look, machine learning is in the news. It’s driving cars and beating the world’s best Go players. Whether we are aware of it or not, it’s in our lives–understanding our voices and identifying our pictures.

Our goal of being able to measure the performance of hardware and software that does machine learning seems more relevant than ever. Our challenge is to scan the vast landscape that is machine learning, and identify which elements to measure first.

There is a natural temptation to see machine learning as being all about neural networks such as AlexNet and GoogLeNet. However, new innovations appear all the time and lots of important work with more classic machine learning techniques is also underway. (Classic machine learning being anything more than a few years old!) Recursive neural networks used for language translation, reinforcement learning used in robotics, and support vector machine (SVM) learning used in text recognition are just a few examples among the wide array of algorithms to consider.

Creating a benchmark or set of benchmarks to cover all those areas, however, is unlikely to be possible. Certainly, creating such an ambitious tool would take so long that it would be of limited usefulness.

Our current thinking is to begin with a small set of representative algorithms. The challenge, of course, is identifying them. That’s where you come in. What would you like to start with?

We anticipate that the benchmark will focus on the types of inference learning and light training that are likely to occur on edge devices. Extensive training with large datasets takes place in data centers or on systems with extraordinary computing capabilities. We’re interested in use cases that will stress the local processing power of everyday devices.

We are, of course, reaching out to folks in the machine learning field—including those in academia, those who create the underlying hardware and software, and those who make the products that rely on that hardware and software.

What do you think?

Bill

Evolve or die

Last week, Google announced that it would retire its Octane benchmark. Their announcement explains that they designed Octane to spur improvement in JavaScript performance, and while it did just that when it was first released, those improvements have plateaued in recent years. They also note that there are some operations in Octane that optimize Octane scores but do not reflect real-world scenarios. That’s unfortunate, because they, like most of us, want improvements in benchmark scores to mean improvements in end-user experience.

WebXPRT comes at the web performance issue differently. While Octane’s goal was to improve JavaScript performance, the purpose of WebXPRT is to measure performance from the end user’s perspective. By doing the types of work real people do, WebXPRT doesn’t measure only improvements in JavaScript performance; it also measures the quality of the real-world user experience. WebXPRT’s results also reflect the performance of the entire device and software stack, not just the performance of the JavaScript interpreter.

Google’s announcement reminds us that benchmarks have finite life spans, that they must constantly evolve to keep pace with changes in technology, or they will become useless. To make sure the XPRT benchmarks do just that, we are always looking at how people use their devices and developing workloads that reflect their actions. This is a core element of the XPRT philosophy.

As we mentioned last week, we’ve working on the next version of WebXPRT. If you have any thoughts about how it should evolve, let us know!

Eric

Thinking ahead to WebXPRT 2017

A few months ago, Bill discussed our intention to update WebXPRT this year. Today, we want to share some initial ideas for WebXPRT 2017 and ask for your input.

Updates to the workloads provide an opportunity to increase the relevance and value of WebXPRT in the years to come. Here are a few of the ideas we’re considering:

  • For the Photo Enhancement workload, we can increase the data sizes of pictures. We can also experiment with additional types of photo enhancement such as background/foreground subtraction, collage creation, or panoramic/360-degree image viewing.
  • For the Organize Album workload, we can explore machine learning workloads by incorporating open source JavaScript libraries into web-based inferencing tests.
  • For the Local Notes workload, we’re investigating the possibility of leveraging natural-brain libraries for language processing functions.
  • For a new workload, we’re investigating the possibility of using online 3D modeling applications such as Tinkercad.

 
For the UI, we’re considering improvements to features like the in-test progress bars and individual subtest selection. We’re also planning to update the UI to make it visually distinct from older versions.

Throughout this process, we want to be careful to maintain the features that have made WebXPRT our most popular tool, with more than 141,000 runs to date. We’re committed to making sure that it runs quickly and simply in most browsers and produces results that are useful for comparing web browsing performance across a wide variety of devices.

Do you have feedback on these ideas or suggestions for browser technologies or test scenarios that we should consider for WebXPRT 2017? Are there existing features we should ditch? Are there elements of the UI that you find especially useful or would like to see improved? Please let us know. We want to hear from you and make sure that we’re crafting a performance tool that continues to meet your needs.

Justin

Check out the other XPRTs:

Forgot your password?