BenchmarkXPRT Blog banner

Category: Browser-based benchmarks

WebXPRT and user-agent strings

After running WebXPRT in Microsoft Edge, a tester recently asked why the browser information field on the results page displayed “Chrome 52 – Edge 15.15063.” It’s a good question; why would the benchmark report Chrome 52 when Microsoft Edge is the browser under test? The answer lies in understanding user-agent strings and the way that WebXPRT gathers specific bits of information.

When browsers request a web page from a hosting server, they send an array of basic header information that allows the server to determine the client’s capabilities and the best way to provide the requested content. One of these headers, the user-agent, presents a string of tokens that provide information about the application making the request, the operating system and version, rendering engine compatibility, and browser platform details. In effect, the user-agent string is a way for a browser to tell the hosting server the full extent of its capabilities.

When WebXPRT attempts to identify a browser, it references the browser token in the user-agent string.

The process is generally straightforward, but in some cases, browsers spoof information from other browsers in their user-agent strings, which makes accurate browser detection difficult. The reasons for this are complex, but they involve web development practices and the fact that some web pages are not designed to recognize and work well with new or less-popular browsers. When we released WebXPRT 2015, Microsoft Edge was new. The Edge team wanted to make sure that as much advanced web content as possible would be available to Edge users, so they created a user-agent string that declared itself to be several different browsers at once.

I can see this in action if I check Edge’s user-agent string on my system. Currently, it reports “Mozilla/5.0 (Windows NT 10.0; Win64; x64; ServiceUI 9) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36 Edge/15.15063.” So, because of the way Edge’s user-agent string is constructed, and the way WebXPRT parses that information, the browser information field on WebXPRT’s results page will report “Chrome 52 – Edge 15.15063” on my system.

To try this on your system, in Edge, select the ellipses icon in the top right-hand corner, then F12 Developer Tools. Next, select the Console tab, and run “javascript:alert(navigator.userAgent).” A popup window will display the UA string.

You can find instructions for finding the user-agent string in other browsers here: http://techdows.com/2016/07/edge-ie-chrome-firefox-user-agent-strings.html.

In the next version of WebXPRT, we’ll work to refine the way that the test parses user-agent strings, and provide more accurate system information for testers. If you have any questions or suggestions regarding this topic, let us know!

Justin

Best practices

Recently, a tester wrote in and asked for help determining why they were seeing different WebXPRT scores on two tablets with the same hardware configuration. The scores differed by approximately 7.5 percent. This can happen for many reasons, including different software stacks, but score variability can also result from different testing behavior and environments. While some degree of variability is natural, the question provides us with a great opportunity to talk about the basic benchmarking practices we follow in the XPRT lab, practices that contribute to the most consistent and reliable scores.

Below, we list a few basic best practices you might find useful in your testing. While they’re largely in the context of the WebXPRT focus on evaluating browser performance, several of these practices apply to other benchmarks as well.

  • Test with clean images: We use an out-of-box (OOB) method for testing XPRT Spotlight devices. OOB testing means that other than initial OS and browser version updates that users are likely to run after first turning on the device, we change as little as possible before testing. We want to assess the performance that buyers are likely to see when they first purchase the device, before installing additional apps and utilities. This is the best way to provide an accurate assessment of the performance retail buyers will experience. While OOB is not appropriate for certain types of testing, the key is to not test a device that’s bogged down with programs that influence results unnecessarily.
  • Turn off updates: We do our best to eliminate or minimize app and system updates after initial setup. Some vendors are making it more difficult to turn off updates completely, but you should always account for update settings.
  • Get a feel for system processes: Depending on the system and the OS, quite a lot of system-level activity can be going on in the background after you turn it on. As much as possible, we like to wait for a stable baseline (idle) of system activity before kicking off a test. If we start testing immediately after booting the system, we often see higher variability in the first run before the scores start to tighten up.
  • Disclosure is not just about hardware: Most people know that different browsers will produce different performance scores on the same system. However, testers aren’t always aware of shifts in performance between different versions of the same browser. While most updates don’t have a large impact on performance, a few updates have increased (or even decreased) browser performance by a significant amount. For this reason, it’s always worthwhile to record and disclose the extended browser version number for each test run. The same principle applies to any other relevant software.
  • Use more than one data point: Because of natural variability, our standard practice in the XPRT lab is to publish a score that represents the median from at least three to five runs. If you run a benchmark only once, and the score differs significantly from other published scores, your result could be an outlier that you would not see again under stable testing conditions.


We hope those tips will make testing a little easier for you. If you have any questions about the XPRTs, or about benchmarking in general, feel free to ask!

Justin

Evolve or die

Last week, Google announced that it would retire its Octane benchmark. Their announcement explains that they designed Octane to spur improvement in JavaScript performance, and while it did just that when it was first released, those improvements have plateaued in recent years. They also note that there are some operations in Octane that optimize Octane scores but do not reflect real-world scenarios. That’s unfortunate, because they, like most of us, want improvements in benchmark scores to mean improvements in end-user experience.

WebXPRT comes at the web performance issue differently. While Octane’s goal was to improve JavaScript performance, the purpose of WebXPRT is to measure performance from the end user’s perspective. By doing the types of work real people do, WebXPRT doesn’t measure only improvements in JavaScript performance; it also measures the quality of the real-world user experience. WebXPRT’s results also reflect the performance of the entire device and software stack, not just the performance of the JavaScript interpreter.

Google’s announcement reminds us that benchmarks have finite life spans, that they must constantly evolve to keep pace with changes in technology, or they will become useless. To make sure the XPRT benchmarks do just that, we are always looking at how people use their devices and developing workloads that reflect their actions. This is a core element of the XPRT philosophy.

As we mentioned last week, we’ve working on the next version of WebXPRT. If you have any thoughts about how it should evolve, let us know!

Eric

Thinking ahead to WebXPRT 2017

A few months ago, Bill discussed our intention to update WebXPRT this year. Today, we want to share some initial ideas for WebXPRT 2017 and ask for your input.

Updates to the workloads provide an opportunity to increase the relevance and value of WebXPRT in the years to come. Here are a few of the ideas we’re considering:

  • For the Photo Enhancement workload, we can increase the data sizes of pictures. We can also experiment with additional types of photo enhancement such as background/foreground subtraction, collage creation, or panoramic/360-degree image viewing.
  • For the Organize Album workload, we can explore machine learning workloads by incorporating open source JavaScript libraries into web-based inferencing tests.
  • For the Local Notes workload, we’re investigating the possibility of leveraging natural-brain libraries for language processing functions.
  • For a new workload, we’re investigating the possibility of using online 3D modeling applications such as Tinkercad.

 
For the UI, we’re considering improvements to features like the in-test progress bars and individual subtest selection. We’re also planning to update the UI to make it visually distinct from older versions.

Throughout this process, we want to be careful to maintain the features that have made WebXPRT our most popular tool, with more than 141,000 runs to date. We’re committed to making sure that it runs quickly and simply in most browsers and produces results that are useful for comparing web browsing performance across a wide variety of devices.

Do you have feedback on these ideas or suggestions for browser technologies or test scenarios that we should consider for WebXPRT 2017? Are there existing features we should ditch? Are there elements of the UI that you find especially useful or would like to see improved? Please let us know. We want to hear from you and make sure that we’re crafting a performance tool that continues to meet your needs.

Justin

Tracking device evolution with WebXPRT ’15, part 2

Last week, we used the Apple iPhone as a test case to show how hardware advances are often reflected in benchmark scores over time. When we compared WebXPRT 2015 scores for various iPhone models, we saw a clear trend of progressively higher scores as we moved from phones with an A7 chip to phones with A8, A9, and A10 Fusion chips. Performance increases over time are not surprising, but WebXPRT ’15 scores also showed us that upgrading from an iPhone 6 to an iPhone 6s is likely to have a much greater impact on web-browsing performance than upgrading from an iPhone 6s to an iPhone 7.

This week, we’re revisiting our iPhone test case to see how software updates can boost device performance without any changes in hardware. The original WebXPRT ’15 tests for the iPhone 5s ran on iOS 8.3, and the original tests for the iPhone 6s, 6s Plus, and SE ran on variants of iOS 9. We updated each phone to iOS 10.0.2 and ran several iterations of WebXPRT ’15.

Upgrading from iOS 8.3 to iOS 10 on the iPhone 5s caused a 17% increase in web-browsing performance, as measured by WebXPRT. Upgrading from iOS 9 to iOS 10 on the iPhone 6s, 6s Plus, and SE produced web-browsing performance gains of 2.6%, 3.6%, and 3.1%, respectively.

The chart below shows the WebXPRT ’15 scores for a range of iPhones, with each iPhone’s iOS version upgrade noted in parentheses. The dark blue columns on the left represent the original scores, and the light blue columns on the right represent the upgrade scores.

Oct 27 iPhone chart

As with our hardware comparison last week, these scores are the median of a range of scores for each device in our database. These scores come both from our own testing and from device reviews from popular tech media outlets.

These results reinforce a message that we repeat often, that many factors other than hardware influence performance. Designing benchmarks that deliver relevant and reliable scores requires taking all factors into account.

What insights have you gained recently from WebXPRT ’15 testing? Let us know!

Justin

Apples to apples?

PCMag published a great review of the Opera browser this week. In addition to looking at the many features Opera offers, the review included performance data from multiple benchmarks, which look at areas such as hardware graphics acceleration, WebGL performance, memory consumption, and battery life.

Three of the benchmarks have a significant, though not exclusive, focus on JavaScript performance: Google Octane 2.0, JetStream 1.1, and WebXPRT 2015. The three benchmarks did not rank the browsers the same way, and in the past, we‘ve discussed some of the reasons why this happens. In addition to the difference in tests, there are also sometimes differences in approaches that are worth considering.

For example, consider the test descriptions for JetStream 1.1. You’ll immediately notice that the tests are much lower-level tests than the ones in WebXPRT. However, consider these phrases from a few of the test descriptions:

  • code-first-load “…This test attempts to defeat the browser’s caching capabilities…”
  • splay-latency “Tests the worst-case performance…”
  • zlib “…modified to restrict code caching opportunities…”

 

While the XPRTs test typical performance for higher level applications, the tests in JetStream are tweaked to stress devices in very specific ways, some of which are not typical. The information these tests provide can be very useful for engineers and developers, but may not be as meaningful to the typical user.

I have to stress that both approaches are valid, but they are doing somewhat different things. There’s a cliché about comparing apples to apples, but not all apples are the same. If you’re making a pie, a Granny Smith would be a good choice, but for snacking, you might be better off with a Red Delicious. Knowing a benchmark’s purpose will help you find the results that are most meaningful to you.

Eric

Check out the other XPRTs:

Forgot your password?