BenchmarkXPRT Blog banner

Category: Benchmarking

Looking under the hood

In the next couple of weeks, we’ll publish the source code and build instructions for the latest HDXPRT 2014 and BatteryXPRT 2014 builds. Access to XPRT source code is one of the benefits of BenchmarkXPRT Development Community membership. For readers who may not know, this a good time to revisit the reasons we make the source code available.

The primary reason is transparency; we want the XPRTs to be as open as possible. As part of our community model for software development, the source code is available to anyone who joins the community. Closed-source benchmark development can lead some people to infer that a benchmark is biased in some way. Our approach makes it impossible to hide any biases.

Another reason we publish source code is to encourage collaborative development and innovation. Community members are involved in XPRT development from the beginning, helping to identify emerging technologies in need of reliable benchmarking tools, suggesting potential workloads and improvements, reviewing design documents, and offering all sorts of general feedback.

Simply put, if you’re interested in benchmarking and the BenchmarkXPRT Development Community, then we’re interested in what you have to say! Community input helps us at every step of the process, and ultimately helps us to create benchmarking tools that are as reliable and relevant as possible.

If you’d like to review XPRT source code, but haven’t yet joined the community, we encourage you to go ahead and join! It’s easy, and if you work for a company or organization with an interest in benchmarking, you can join the community for free. Simply fill out the form with your company e-mail address and click the option to be considered for a free membership. We’ll contact you to verify the address is real and then activate your membership.

If you have any other questions about community membership or XPRT source code, feel free to contact us. We look forward to hearing from you!

Justin

Digging deeper

From time to time, we like to revisit the fundamentals of the XPRT approach to benchmark development. Today, we’re discussing the need for testers and benchmark developers to consider the multiple factors that influence benchmark results. For every device we test, all of its hardware and software components have the potential to affect performance, and changing the configuration of those components can significantly change results.

For example, we frequently see significant performance differences between different browsers on the same system. In our recent recap of the XPRT Weekly Tech Spotlight’s first year, we highlighted an example of how testing the same device with the same benchmark can produce different results, depending on the software stack under test. In that instance, the Alienware Steam Machine entry included a WebXPRT 2015 score for each of the two browsers that consumers were likely to use. The first score (356) represented the SteamOS browser app in the SteamOS environment, and the second (441) represented the Iceweasel browser (a Firefox variant) in the Linux-based desktop environment. Including only the first score would have given readers an incomplete picture of the Steam Machine’s web-browsing capabilities, so we thought it was important to include both.

We also see performance differences between different versions of the same browser, a fact especially relevant to those who use frequently updated browsers, such as Chrome. Even benchmarks that measure the same general area of performance, for example, web browsing, are usually testing very different things.

OS updates can also have an impact on performance. Consumers might base a purchase on performance or battery life scores and end up with a device that behaves much differently when updated to a new version of Android or iOS, for example.

Other important factors in the software stack include pre-installed software, commonly referred to as bloatware, and the proliferation of apps that sap performance and battery life.

This is a much larger topic than we can cover in the blog. Let the examples we’ve mentioned remind you to think critically about, and dig deeper into, benchmark results. If we see published XPRT scores that differ significantly from our own results, our first question is always “What’s different between the two devices?” Most of the time, the answer becomes clear as we compare hardware and software from top to bottom.

Justin

Celebrating one year of the XPRT Weekly Tech Spotlight

It’s been just over a year since we launched the XPRT Weekly Tech Spotlight by featuring our first device, the Google Pixel C. Spotlight has since become one of the most popular items at BenchmarkXPRT.com, and we thought now would be a good time to recap the past year, offer more insight into the choices we make behind the scenes, and look at what’s ahead for Spotlight.

The goal of Spotlight is to provide PT-verified specs and test results that can help consumers make smart buying decisions. We try to include a wide variety of device types, vendors, software platforms, and price points in our inventory. The devices also tend to fall into one of two main groups: popular new devices generating a lot of interest and devices that have unique form factors or unusual features.

To date, we’ve featured 56 devices: 16 phones, 11 laptops, 10 two-in-ones, 9 tablets, 4 consoles, 3 all-in-ones, and 3 small-form-factor PCs. The operating systems these devices run include Android, ChromeOS, iOS, macOS, OS X, Windows, and an array of vendor-specific OS variants and skins.

As much as possible, we test using out-of-the-box (OOB) configurations. We want to present test results that reflect what everyday users will experience on day one. Depending on the vendor, the OOB approach can mean that some devices arrive bogged down with bloatware while others are relatively clean. We don’t attempt to “fix” anything in those situations; we simply test each device “as is” when it arrives.

If devices arrive with outdated OS versions (as is often the case with Chromebooks), we update to current versions before testing, because that’s the best reflection of what everyday users will experience. In the past, that approach would’ve been more complicated with Windows systems, but the Microsoft shift to “Windows as a service” ensures that most users receive significant OS updates automatically by default.

The OOB approach also means that the WebXPRT scores we publish reflect the performance of each device’s default browser, even if it’s possible to install a faster browser. Our goal isn’t to perform a browser shootout on each device, but to give an accurate snapshot of OOB performance. For instance, last week’s Alienware Steam Machine entry included two WebXPRT scores, a 356 on the SteamOS browser app and a 441 on Iceweasel 38.8.0 (a Firefox variant used in the device’s Linux-based desktop mode). That’s a significant difference, but the main question for us was which browser was more likely to be used in an OOB scenario. With the Steam Machine, the answer was truly “either one.” Many users will use the browser app in the SteamOS environment and many will take the few steps needed to access the desktop environment. In that case, even though one browser was significantly faster than the other, choosing to omit one score in favor of the other would have excluded results from an equally likely OOB environment.

We’re always looking for ways to improve Spotlight. We recently began including more photos for each device, including ones that highlight important form-factor elements and unusual features. Moving forward, we plan to expand Spotlight’s offerings to include automatic score comparisons, additional system information, and improved graphical elements. Most importantly, we’d like to hear your thoughts about Spotlight. What devices and device types would you like to see? Are there specs that would be helpful to you? What can we do to improve Spotlight? Let us know!

Justin

Mobile World Congress 2017 and the territories ahead

Walking the halls of this year’s Mobile World Congress (MWC)—and, once again, I walked by every booth in every one of them—it was clear that mobile technology is expanding faster than ever into more new tech territories than ever before.

On the device front, cameras and camera quality have become a pitched battleground, with mobile phone makers teaming with camera manufacturers to give us better and better images and video. This fight is far from over, too, because vendors are exploring many different ways to improve mobile phone camera quality. Quick charging is a hot new trend we can expect to hear more about in the days to come. Of course, apps and their performance continue to matter greatly, because if you can do it from any computer, you better be able to do at least some of it from your phone.

The Internet of Things (IoT) grabbed many headlines, with vendors still selling more dreams than reality, but some industries living this future now. The proliferation of IoT devices will result, of course, in massive increases in the amount of data flowing through the world’s networks, which in turn will require more and more computing power to analyze and use. That power will need to be everywhere, from massive datacenters to the device in your hand, because the more data you have, the more you’ll want to customize it to your particular needs.

Similarly, AI was a major theme of the show, and it’s also likely to suck up computing cycles everywhere. The vast majority of the work will, of course, end up in datacenters, but some processing is likely to be local, particularly in situations, such as real-time translation, where we can’t afford significant comm delays.

5G, the next big step in mobile data speeds, was everywhere, with most companies seeming to agree the new standard was still years away–but also excited about what will be possible. When you can stream 4K movies to your phone wirelessly while simultaneously receiving and customizing analyses of your company’s IoT network, you’re going to need a powerful, sophisticated device running equally powerful and sophisticated apps.

Everywhere I looked, the future was bright—and complicated, and likely to place increasing demands on all of our devices. We’ll need guides as we find our paths through these new territories and as we determine the right device tools for our jobs, so the need for the XPRTs will only increase. I look forward to seeing where we, the BenchmarkXPRT Development Community, take them next.

Mark

A new reality

A while back, I wrote about a VR demo built by students from North Carolina State University. We’ve been checking it out over the last couple of months and are very impressed. This workload will definitely heat up your device! While the initial results look promising, this is still an experimental workload and it’s too early to use results in formal reviews or product comparisons.

We’ve created a page that tells all about the VR demo. As an experimental workload, the demo is available only to community members. As always, members can download the source as well as the APK.

We asked the students to try to build the workload for iOS as a stretch goal. They successfully built an iOS version, but this was at the end of the semester and there was little time for testing. If you want to experiment with iOS yourself, look at the build instructions for Android and iOS that we include with the source. Note that you will need Xcode to build and deploy the demo on iOS.

After you’ve checked out the workload, let us know what you think!

Finally, we have a new video featuring the VR demo. Enjoy!

vr-demo-video

Eric

Experience is the best teacher

One of the core principles that guides the design of the XPRT tools is they should reflect the way real-world users use their devices. The XPRTs try to use applications and workloads that reflect what users do and the way that real applications function. How did we learn how important this is? The hard way—by making mistakes! Here’s one example.

In the 1990s, I was Director of Testing for the Ziff-Davis Benchmark Operation (ZDBOp). The benchmarks ZDBOp created for its technical magazines became the industry standards, because of both their quality and Ziff-Davis’ leadership in the technical trade press.

WebBench, one of the benchmarks ZDBOp developed, measured the performance of early web servers. We worked hard to create a tool that used physical clients and tested web server performance over an actual network. However, we didn’t pay enough attention to how clients actually interacted with the servers. In the first version of WebBench, the clients opened connections to the server, did a small amount of work, closed the connections, and then opened new ones.

When we met with vendors after the release of WebBench, they begged us to change the model. At that time, browsers opened relatively long-lived connections and did lots of work before closing them. Our model was almost the opposite of that. It put vendors in the position of having to choose between coding to give their users good performance and coding to get good WebBench results.

Of course, we were horrified by this, and worked hard to make the next version of the benchmark reflect more closely the way real browsers interacted with web servers. Subsequent versions of WebBench were much better received.

This is one of the roots from which the XPRT philosophy grew. We have tried to learn and grow from the mistakes we’ve made. We’d love to hear about any of your experiences with performance tools so we can all learn together.

Eric

Check out the other XPRTs:

Forgot your password?