BenchmarkXPRT Blog banner

Category: large language models

Web APIs: Possible paths for the AI-focused WebXPRT 4 auxiliary workload

In our last blog post, we discussed one of the major decision points we’re facing as we work on what we hope will be the first new AI-focused WebXPRT 4 auxiliary workload: choosing a Web AI framework. In today’s blog, we’re discussing another significant decision that we need to make for the future workload’s development path: choosing a web API.

Many of you are familiar with the concept of an application programming interface (API). Simply put, APIs implement sets of software rules, tools, and/or protocols that serve as intermediaries that make it possible for different computer programs or components to communicate with each other. APIs simplify many development tasks for programmers and provide standardized ways for applications to share data, functions, and system resources.

Web APIs fulfill the intermediary role of an API—through HTTP-based communication—for web servers (on the server side) or web browsers (on the client side). Client-side web APIs make it possible for browser-based applications to expand browser functionality. They execute the kinds of JavaScript, HTML5, and WebAssembly (Wasm) workloads—among other examples—that support the wide variety of browser extensions many of us use every day. WebXPRT uses those types of browser-based workloads to evaluate system performance. To lay a solid foundation for the first future browser-based AI workload, we need to choose a web API that will be compatible with WebXPRT and the Web AI framework and AI inference workload(s) we ultimately choose.

Currently, there are three main web API paths for running AI inference in a web browser: Web Neural Network (WebNN), Wasm, and WebGPU. These three web technologies are in various stages of development and standardization. Each has different levels of support within the major browsers. Here are basic overviews of each of the three options, as well as a few of our thoughts on the benefits and limitations that each may bring to the table for a future WebXPRT AI workload:

  • WebNN is a JavaScript API that enables developers to directly execute machine learning (ML) tasks on neural networks within web-based applications. WebNN makes it easier to integrate ML models into web apps, and it allows web apps to leverage the power of neural processing units (NPUs). WebNN has a lot going for it. It’s hardware-agnostic and works with various ML frameworks. It’s likely to be a major player in future browser-based inference applications. However, as a web standard, WebNN is still in the development stage and is only available in developer previews for Chromium-based browsers. Full default WebNN support could take a year or more.
  • Wasm is a binary instruction format that works across all modern browsers. Wasm provides a sandboxed environment that operates at near-native speeds and takes advantage of common hardware specs across platforms. Wasm’s capabilities offer web developers a great deal of flexibility for running complex client applications in the browser. Simply put, Wasm can help developers adapt their existing code for additional platforms and browser-based applications without requiring extensive code rewrites. Wasm’s flexibility and cross-platform compatibility is one of the reasons that we’ve already made use of Wasm in two existing WebXPRT 4 workloads that feature AI tasks: Organize Album using AI, and Encrypt Notes and OCR Scan. Wasm can also work together with other web APIs, such as WebGPU.
  • WebGPU enables web-based applications to directly access the graphics rendering and computational capabilities of a system’s GPU. The parallel computational abilities of GPUs make them especially well-suited to efficiently handle some of the demands of AI inference workloads, including image-based GenAI workloads or large language models. Google Chrome and Microsoft Edge currently support WebGPU, and it’s available in Safari through a tech preview.

Right now, we don’t think that WebNN will be fully out of the development phase in time to serve as our go-to web API for a new WebXPRT AI workload. Wasm and/or WebGPU appear to our best options for now. When WebNN is fully baked and available in mainstream browsers, it’s possible that we could port any existing Wasm- or WebGPU-based WebXPRT AI workloads to WebNN, which may open the possibility of cross-platform browser-based NPU performance comparisons.

All that said and as we mentioned in our previous post about Web AI frameworks, we have not made any final decisions about a web API or any aspect of the future workload. We’re still in the early stages of this project. We want your input.

If this discussion has sparked web AI ideas that you think would benefit the process, or if you have feedback you’d like to share, please feel free to contact us!

Justin

Local AI and new frontiers for performance evaluation

Recently, we discussed some ways the PC market may evolve in 2024, and how new Windows on Arm PCs could present the XPRTs with many opportunities for benchmarking. In addition to a potential market shakeup from Arm-based PCs in the coming years, there’s a much broader emerging trend that could eventually revolutionize almost everything about the way we interact with our personal devices—the development of local, dedicated AI processing units for consumer-oriented tech.

AI already impacts daily life for many consumers through technologies such as such as predictive text, computer vision, adaptive workflow apps, voice recognition, smart assistants, and much more. Generative AI-based technologies are rapidly establishing a permanent, society-altering presence across a wide range of industries. Aside from some localized inference tasks that the CPU and/or GPU typically handle, the bulk of the heavy compute power that fuels those technologies has been in the cloud or in on-prem servers. Now, several major chipmakers are working to roll out their own versions of AI-optimized neural processing units (NPUs) that will enable local devices to take on a larger share of the AI load.

Examples of dedicated AI hardware in recently-released or upcoming consumer devices include Intel’s new Meteor Lake NPU, Apple’s Neural Engine for M-series SoCs, Qualcomm’s Hexagon NPU, and AMD’s XDNA 2 architecture. The potential benefits of localized, NPU-facilitated AI are straightforward. On-device AI could reduce power consumption and extend battery life by offloading those tasks from the CPUs. It could alleviate certain cloud-related privacy and security concerns. Without the delays inherent in cloud queries, localized AI could execute inference tasks that operate much closer to real time. NPU-powered devices could fine-tune applications around your habits and preferences, even while offline. You could pull and utilize relevant data from cloud-based datasets without pushing private data in return. Theoretically, your device could know a great deal about you and enhance many areas of your daily life without passing all that data to another party.

Will localized AI play out that way? Some tech companies envision a role for on-device AI that enhances the abilities of existing cloud-based subscription services without decoupling personal data. We’ll likely see a wide variety of capabilities and services on offer, with application-specific and SaaS-determined privacy options.

Regardless of the way on-device AI technology evolves in the coming years, it presents an exciting new frontier for benchmarking. All NPUs will not be created equal, and that’s something buyers will need to understand. Some vendors will optimize their hardware more for computer vision, or large language models, or AI-based graphics rendering, and so on. It won’t be enough for business and consumers to simply know that a new system has dedicated AI processing abilities. They’ll need to know if that system performs well while handling the types of AI-related tasks that they do every day.

Here at the XPRTs, we specialize in creating benchmarks that feature real-world scenarios that mirror the types of tasks that people do in their daily lives. That approach means that when people use XPRT scores to compare device performance, they’re using a metric that can help them make a buying decision that will benefit them every day. We look forward to exploring ways that we can bring XPRT benchmarking expertise to the world of on-device AI.

Do you have ideas for future localized AI workloads? Let us know!

Justin

Check out the other XPRTs:

Forgot your password?