BenchmarkXPRT Blog banner

Category: Benchmarking

News about WebXPRT and BatteryXPRT

Last month, we gave readers a glimpse of the updates in store for the next WebXPRT, and now we have more news to report on that front.

The new version of WebXPRT will be called WebXPRT 3. WebXPRT 3 will retain the convenient features that made WebXPRT 2013 and WebXPRT 2015 our most popular tools, with more than 200,000 combined runs to date. We’ve added new elements, including AI, to a few of the workloads, but the test will still run in 15 minutes or less in most browsers and produce the same easy-to-understand results that help compare browsing performance across a wide variety of devices.

We’re also very close to publishing the WebXPRT 3 Community Preview. For those unfamiliar with our open development community model, BenchmarkXPRT Development Community members have the ability to preview and test new benchmark tools before we release them to the general public. Community previews are a great way for members to evaluate new XPRTs and send us feedback. If you’re interested in joining, you can register here.

In BatteryXPRT news, we recently started to see unusual battery life estimates and high variance when running battery life tests at the default length of 5.25 hours. We think this may be due to changes in how new OS versions are reporting battery life on certain devices, but we’re in the process of extensive testing to learn more. In the meantime, we recommend that BatteryXPRT users adjust the test run time to allow for a full rundown.

Do you have questions or comments about WebXPRT or BatteryXPRT? Let us know!

Justin

How to submit results for the WebXPRT Processor Comparison Chart

The WebXPRT 2015 Processor Comparison Chart is in its second month, and we’re excited to see that people are browsing the scores. We’re also starting to receive more WebXPRT score submissions for publication, so we thought it would be a good time to describe how that process works.

Unlike sites that publish any results they receive, we hand-select results from internal lab testing, end-of-test user submissions, and reliable tech media sources. In each case, we evaluate whether the score is consistent with general expectations. For sources outside of our lab, that evaluation includes checking to see whether there is enough detailed system information to get a sense of whether the score makes sense. We do this for every score on the WebXPRT results page and the general XPRT results page.

If we decide to publish a WebXPRT result, that score automatically appears in the processor comparison chart as well. If you would like to submit your score, the submission process is quick and easy. At the end of the WebXPRT test run, click the Submit button below the individual workload scores, complete the short submission form, and click Submit again. The screenshot below shows how the form would look if I submitted a score at the end of a WebXPRT run on my personal system.

WebXPRT results submission

After you submit your score, we’ll contact you to confirm the name we should display as the source for the data. You can use one of the following:

  • Your first and last name
  • “Independent tester,” if you wish to remain anonymous
  • Your company’s name, provided that you have permission to submit the result in their name. If you want to use a company name, we ask that you provide your work email address.


We will not publish any additional information about you or your company without your permission.

We look forward to seeing your score submissions, and if you have suggestions for the processor chart or any other aspect of the XPRTs, let us know!

Justin

Nothing to hide

I recently saw an article in ZDNet by my old friend Steven J. Vaughan-Nichols that talks about how NetMarketShare and StatCounter reported a significant jump in the operating system market shares for Linux and Chrome OS. One frustration Vaughan-Nichols alluded to in the article is the lack of transparency into how these firms calculated market share, so he can’t gauge how reliable they are. Because neither NetMarketShare nor StatCounter disclosed their methods, there’s no sure way for interested observers to verify the numbers. Steven prefers the data from the federal government’s Digital Analytics Program (DAP). DAP makes its data freely available, so you can run your own calculations. Transparency generates trust.

Transparency is a core value for the XPRTs. We’ve written before about how statistics can be misleading. That’s why we’ve always disclosed exactly how the XPRTs calculate performance results, and the way BatteryXPRT calculates battery life. It’s also why we make each XPRT’s source code available to community members. We want to be open and honest about how we do things, and our open development community model fosters the kind of constructive feedback that helps to continually improve the XPRTs.

We’d love for you to be a part of that process, so if you have questions or suggestions for improvement, let us know. If you’d like to gain access to XPRT source code and previews of upcoming benchmarks, today is a great day to join the community!

Eric

Glimpses of the next WebXPRT

Development work on the next version of WebXPRT is well underway, and we think it’s a good time to offer a glimpse of what’s to come.

We’ve updated the photo-related workloads with new images and are experimenting with adding a new task to the Organize Album workload. The task utilizes ConvNetJS, a JavaScript library designed for training neural networks within the browser itself, to assign classifications to a set of images. It’s an example of the type of integrated deep learning tasks that will be showing up in all sorts of devices in the years to come.

We’re also testing an additional task in the Local Notes workload using Tesseract.js, a popular OCR (optical character recognition) engine. Our scenario uses OCR technology to scan images of purchase receipts and gather relevant information.

We’re testing these new tasks now, and will include them only once we’re confident that they produce consistent and reliable results without extending the benchmark’s runtime unnecessarily.  Consequently, the next WebXPRT might contain variations of these tasks, or other new technologies altogether. We mention them now to offer some insight into the types of workload enhancements that we’re considering.

We’ve been working hard on the new WebXPRT UI as well. The image below shows the new start page from an early development build. We’re still making adjustments, so the final product will probably differ, but you do get a sense of the new UI’s clean look.

WebXPRT screen shot

As we’ve said before, we’re committed to making sure that WebXPRT runs in most browsers and produces results that are useful for comparing web browsing performance across a wide variety of devices. We appreciate the feedback we’ve gotten so far, and are happy to receive more. Do you have ideas for the next WebXPRT? Let us know!

Justin

Looking for performance clues

We’ve written before about how the operating system and other software can influence test scores and even battery life. Benchmarks like the XPRTs provide overall results, but teasing out which factors affect those results may require some detective work. The key is to collect individual data points as evidence to what may be causing performance changes.

The Apple iOS 11 rollout last month is an excellent example of the effect of software on device performance. Angry tweets started almost immediately after the update, claiming that iOS 11 drained device batteries. iPhone users here at PT experienced similar issues. What was the cause of that performance drop? The hardware remained the same. So, did software cause the problem?

Less than a week after the rollout, Mashable published an explanation of possible causes. The article quotes research from mobile security firm Wandera showing that, for the 50,000 “moderate to heavy iPhone and iPad users” in the study, devices running iOS 11 burned through their battery at much faster rates than the same devices running iOS 10. They cite two possible causes:

    • devices often re-categorize the files stored on them for every new OS install, which may account for some of the battery issues.
    • many apps are not optimized for iOS 11 yet.

 

While these explanations make sense, with a little more digging, we could get closer to actually solving the mystery instead of guessing at the causes. After all, it is also possible that people are using iOS 11 differently from iOS 10. So, how could a dedicated sleuth investigate further? Anyone using benchmarks and hands-on testing to sift through various scenarios and configurations could get us closer to solving this mystery and any other software-based performance anomalies. But it’s a daunting task—changing only one variable at a time and recording the results is like pounding the streets and knocking on doors to solve a case.

In all likelihood, some combination of Apple iOS updates and application changes will improve the battery life for iOS 11. In the meantime, we wish we had an XPRT that could test battery life on iOS. Who knows, maybe some future version of WebXPRT will be able to help in future sleuthing.

Eric

Machine learning performance tool update

Earlier this year we started talking about our efforts to develop a tool to help in evaluating machine learning performance. We’ve given some updates since then, but we’ve also gotten some questions, so I thought I’d do my best to summarize our answers for everyone.

Some have asked what kinds of algorithms we’ve been looking into. As we said in an earlier blog, we’re looking at  algorithms involved in computer vision, natural language processing, and data analytics, particularly different aspects of computer vision.

One seemingly trivial question we’ve received regards the proposed name, MLXPRT. We have been thinking of this tool as evaluating machine learning performance, but folks have raised a valid concern that it may well be broader than that. Does machine learning include deep learning? What about other artificial intelligence approaches? I’ve certainly seen other approaches lumped into machine learning, probably because machine learning is the hot topic of the moment. It feels like everything is boasting, “Now with machine learning!”

While there is some value in being part of such a hot movement, we’ve begun to wonder if a more inclusive name, such as AIXPRT, would be better. We’d love to hear your thoughts on that.

We’ve also had questions about the kind of devices the tool will run on. The short answer is that we’re concentrating on edge devices. While there is a need for server AI/ML tools, we’ve been focusing on the evaluating the devices close to the end users. As a result, we’re looking at the inference aspect of machine learning rather than the training aspect.

Probably the most frequent thing we’ve been asked about is the timetable. While we’d hoped to have something available this year, we were overly optimistic. We’re currently working on a more detailed proposal of what the tool will be, and we aim to make that available by the end of this year. If we achieve that goal, our next one will be to have a preliminary version of the tool itself ready in the first half of 2018.

As always, we seek input from folks, like yourself, who are working in these areas. What would you most like to see in an AI/machine learning performance tool? Do you have any questions?

Bill 

Check out the other XPRTs:

Forgot your password?