Up next for WebXPRT 4: A new AI-focused workload!

We’re always thinking about ways to improve WebXPRT. In the past, we’ve discussed the potential benefits of auxiliary workloads and the role that such workloads might play in future WebXPRT updates and versions. Today, we’re very excited to announce that we’ve decided to move forward with the development of a new WebXPRT 4 workload focused on browser-side AI technology!

WebXPRT 4 already includes timed AI tasks in two of its workloads: the Organize Album using AI workload and the Encrypt Notes and OCR Scan workload. These two workloads reflect the types of light browser-side inference tasks that have been available for a while now, but most heavy-duty inference on the web has historically happened in on-prem servers or in the cloud. Now, localized AI technology is growing by leaps and bounds, and the integration of new AI capabilities with browser-based tasks is on the threshold of advancing rapidly.

Because of this growth, we believe now is the time to start work on giving WebXPRT 4 the ability to evaluate new browser-based AI capabilities—capabilities that are likely to become a part of everyday life in the next few years. We haven’t yet decided on a test scenario or software stack for the new workload, but we’ll be working to refine our plan in the coming months. There seems to be some initial promise in emerging frameworks such as ONNX Runtime Web, which allows users to run and deploy web-based machine learning models by using JavaScript APIs and libraries. In addition, new Web APIs like WebGPU (currently supported in Edge, Chrome, and tech preview in Safari) and WebNN (in development) may soon help facilitate new browser-side AI workloads.

We know that many longtime WebXPRT 4 users will have questions about how this new workload may affect their tests. We want to assure you that the workload will be an optional bonus workload and will not run by default during normal WebXPRT 4 tests. As you consider possibilities for the new workload, here are a few points to keep in mind:

  • The workload will be optional for users to run.
  • It will not affect the main WebXPRT 4 subtest or overall scores in any way.
  • It will run separately from the main test and will produce its own score(s).
  • Current and future WebXPRT 4 results will still be comparable to one another, so users who’ve already built a database of WebXPRT 4 scores will not have to retest their devices.
  • Because many of the available frameworks don’t currently run on all browsers, the workload may not run on every platform.

As we research available technologies and explore our options, we would love to hear from you. If you have ideas for an AI workload scenario that you think would be useful or thoughts on how we should implement it, please let us know! We’re excited about adding new technologies and new value to WebXPRT 4, and we look forward to sharing more information here in the blog as we make progress.


Local AI and new frontiers for performance evaluation

Recently, we discussed some ways the PC market may evolve in 2024, and how new Windows on Arm PCs could present the XPRTs with many opportunities for benchmarking. In addition to a potential market shakeup from Arm-based PCs in the coming years, there’s a much broader emerging trend that could eventually revolutionize almost everything about the way we interact with our personal devices—the development of local, dedicated AI processing units for consumer-oriented tech.

AI already impacts daily life for many consumers through technologies such as such as predictive text, computer vision, adaptive workflow apps, voice recognition, smart assistants, and much more. Generative AI-based technologies are rapidly establishing a permanent, society-altering presence across a wide range of industries. Aside from some localized inference tasks that the CPU and/or GPU typically handle, the bulk of the heavy compute power that fuels those technologies has been in the cloud or in on-prem servers. Now, several major chipmakers are working to roll out their own versions of AI-optimized neural processing units (NPUs) that will enable local devices to take on a larger share of the AI load.

Examples of dedicated AI hardware in recently-released or upcoming consumer devices include Intel’s new Meteor Lake NPU, Apple’s Neural Engine for M-series SoCs, Qualcomm’s Hexagon NPU, and AMD’s XDNA 2 architecture. The potential benefits of localized, NPU-facilitated AI are straightforward. On-device AI could reduce power consumption and extend battery life by offloading those tasks from the CPUs. It could alleviate certain cloud-related privacy and security concerns. Without the delays inherent in cloud queries, localized AI could execute inference tasks that operate much closer to real time. NPU-powered devices could fine-tune applications around your habits and preferences, even while offline. You could pull and utilize relevant data from cloud-based datasets without pushing private data in return. Theoretically, your device could know a great deal about you and enhance many areas of your daily life without passing all that data to another party.

Will localized AI play out that way? Some tech companies envision a role for on-device AI that enhances the abilities of existing cloud-based subscription services without decoupling personal data. We’ll likely see a wide variety of capabilities and services on offer, with application-specific and SaaS-determined privacy options.

Regardless of the way on-device AI technology evolves in the coming years, it presents an exciting new frontier for benchmarking. All NPUs will not be created equal, and that’s something buyers will need to understand. Some vendors will optimize their hardware more for computer vision, or large language models, or AI-based graphics rendering, and so on. It won’t be enough for business and consumers to simply know that a new system has dedicated AI processing abilities. They’ll need to know if that system performs well while handling the types of AI-related tasks that they do every day.

Here at the XPRTs, we specialize in creating benchmarks that feature real-world scenarios that mirror the types of tasks that people do in their daily lives. That approach means that when people use XPRT scores to compare device performance, they’re using a metric that can help them make a buying decision that will benefit them every day. We look forward to exploring ways that we can bring XPRT benchmarking expertise to the world of on-device AI.

Do you have ideas for future localized AI workloads? Let us know!


The XPRTs will be at Mobile World Congress later this month!

Mobile World Congress (MWC) 2023 kicks off on February 27th, and we’re excited that Mark Van Name will be attending the event for the first time since the last pre-pandemic show in 2019. Each year, MWC offers a great opportunity to examine the new trends and technologies that will shape mobile technology in the years to come. The major themes of this year’s show include the latest advances in 5G and IoT technologies, along with what GSMA is calling “Reality+.” Reality+ refers to the intersection of AI, AR, VR, and 5G, and the potential impacts of these immersive technologies on our future.

Mark will be sharing his thoughts from this year’s show here in the XPRT blog, so be sure to stayed tuned. Will you be attending MWC this year? If so, let us know!


A note about AIXPRT

Recently, a member of the tech press asked us about the status of AIXPRT, our benchmark that measures machine learning inference performance. We want to share our answer here in the blog for the benefit of other readers. The writer said it seemed like we had not updated AIXPRT in a long time, and wondered whether we had any immediate plans to do so.

It’s true that we haven’t updated AIXPRT in quite some time. Unfortunately, while a few tech press publications and OEM labs began experimenting with AIXPRT testing, the benchmark never got the traction we hoped for, and we’ve decided to invest our resources elsewhere for the time being. The AIXPRT installation packages are still available for people to use or reference as they wish, but we have not updated the benchmark to work with the latest platform versions (OpenVINO, TensorFlow, etc.). It’s likely that several components in each package are out of date.

If you are interested in AIXPRT and would like us to bring it up to date, please let us know. We can’t promise that we’ll revive the benchmark, but your feedback could be a valuable contribution as we try to gauge the benchmarking community’s interest.


Here’s what to expect in the WebXPRT 4 Preview

A few months ago, we shared detailed information about the changes we expected to make in WebXPRT 4. We are currently doing internal testing of the WebXPRT 4 Preview build in preparation for releasing it to the public. We want to let our readers know what to expect.

We’ve made some changes since our last update and some of the details we present below could still change before the preview release. However, we are much closer to the final product. Once we release the WebXPRT 4 Preview, testers will be able to publish scores from Preview build testing. We will limit any changes that we make between the Preview and the final release to the UI or features that are not expected to affect test scores.

General changes

Some of the non-workload changes we’ve made in WebXPRT 4 relate to our typical benchmark update process.

  • We have updated the aesthetics of the WebXPRT UI to make WebXPRT 4 visually distinct from older versions. We did not significantly change the flow of the UI.
  • We have updated content in some of the workloads to reflect changes in everyday technology, such as upgrading most of the photos in the photo processing workloads to higher resolutions.
  • We have not yet added a looping function to the automation scripts, but are still considering it for the future.
  • We investigated the possibility of shortening the benchmark by reducing the default number of iterations from seven to five, but have decided to stick with seven iterations to ensure that score variability remains acceptable across all platforms.

Workload changes

  • Photo Enhancement. We increased the efficiency of the workload’s Canvas object creation function, and replaced the existing photos with new, higher-resolution photos.
  • Organize Album Using AI. We replaced ConvNetJS with WebAssembly (WASM) based OpenCV.js for both the face detection and image classification tasks. We changed the images for the image classification tasks to images from the ImageNet dataset.
  • Stock Option Pricing. We updated the dygraph.js library.
  • Sales Graphs. We made no changes to this workload.
  • Encrypt Notes and OCR Scan. We replaced ASM.js with WASM for the Notes task and updated the WASM-based Tesseract version for the OCR task.
  • Online Homework. In addition to the existing scenario which uses four Web Workers, we have added a scenario with two Web Workers. The workload now covers a wider range of Web Worker performance, and we calculate the score by using the combined run time of both scenarios. We also updated the typo.js library.

Experimental workloads

As part of the WebXPRT 4 development process, we researched the possibility of including two new workloads: a natural language processing (NLP) workload, and an Angular-based message scrolling workload. After much testing and discussion, we have decided to not include these two workloads in WebXPRT 4. They will be good candidates for us to add as experimental WebXPRT 4 workloads in 2022.

The release timeline

Our goal is to publish the WebXPRT 4 preview build by December 15th, which will allow testers to publish scores in the weeks leading up to the Consumer Electronics Show in Las Vegas in January 2022. We will provide more detailed information about the GA timeline here in the blog as soon as possible.

If you have any questions about the details we’ve shared above, please feel free to ask!


Thinking about experimental WebXPRT workloads in 2022

As the WebXPRT 4 development process has progressed, we’ve started to discuss the possibility of offering experimental WebXPRT 4 workloads in 2022. These would be optional workloads that test cutting-edge browser technologies or new use cases. The individual scores for the experimental workloads would stand alone, and would not factor in the WebXPRT 4 overall score.

WebXPRT testers would be able to run the experimental workloads one of two ways: by manually selecting them on the benchmark’s home screen, or by adjusting a value in the WebXPRT 4 automation scripts.

Testers would benefit from experimental workloads by being able to compare how well certain browsers or systems handle new tasks (e.g., new web apps or AI capabilities). We would benefit from fielding workloads for large-scale testing and user feedback before we commit to including them as core WebXPRT workloads.

Do you have any general thoughts about experimental workloads for browser performance testing, or any specific workloads that you’d like us to consider? Please let us know.


