BenchmarkXPRT Blog banner

AIXPRT: We want your feedback!

Today, we’re publishing the AIXPRT Request for Comments (RFC) document. The RFC explains the need for a new artificial intelligence (AI)/machine learning benchmark, shows how the BenchmarkXPRT Development Community plans to address that need, and provides preliminary design specifications for the benchmark.

We’re seeking feedback and suggestions from anyone interested in shaping the future of machine learning benchmarking, including those not currently part of the Development Community. Usually, only members of the BenchmarkXPRT Development Community have access to our RFCs and the opportunity to provide feedback. However, because we’re seeking input from non-members who have expertise in this field, we will be posting this RFC in the New events & happenings section of the main BenchmarkXPRT.com page and making it available at AIXPRT.com.

We welcome input on all aspects of the benchmark, including scope, workloads, metrics and scores, UI design, and reporting requirements. We will accept feedback through May 13, 2018, after which BenchmarkXPRT Development Community administrators will collect and evaluate the feedback and publish the final design specification.

Please share the RFC with anyone interested in machine learning benchmarking and please send us your feedback before May 13.

Justin

Comparing open source and open development

Why do we use open development when designing and building the XPRTs, and what’s the difference between our open development approach and traditional open-source methods? The terminology around these two models can be confusing, so we wanted to review some similarities and differences.

Why open development?

An open development approach helps encourage collaboration, innovation, and transparency. XPRT community members get involved in the development of each benchmark from the beginning:

  • They submit suggestions, questions, and concerns that inform the future design of the tools.
  • They view early proposals for new versions and contribute comments for the final design.
  • They suggest new workloads.
  • They have access to community previews (beta builds) of the tools.
  • They submit source code for inclusion in the benchmarks.
  • They examine existing source code.

A commitment to transparency

Because we’re committed to publishing reliable, unbiased benchmarks, we also want make the XPRT development process as transparent as possible. It’s not unusual for people to claim that any given benchmark contains hidden biases. To address this problem, we make our source code available to anyone who joins the community. This approach reduces the risk of unforeseen bias in our benchmarks.

Quality control

Unlike open-source models, open development allows us to control derivative works, which can be important in benchmarking. While open source encourages a constantly evolving product that may fork into substantially different versions, benchmarking requires a product that remains static to enable valid comparisons over time. By controlling derivative works, we can avoid the problem of unauthorized versions of the benchmarks being published as “XPRTs.”

In the future, we may use a traditional open-source model for specific XPRTs or other projects. If we do, we’ll share our reasoning with the community and ask for their thoughts about the best way to proceed. If you’re not a community member, but are interested in benchmark development, we encourage you to join today!

Justin

New opportunities for TouchXPRT

Next week’s XPRT Weekly Tech Spotlight will feature a unique device: the HP Envy x2 2-in-1. The first device of its kind on the market, the Envy x2 runs Windows 10 on ARM hardware—in this case, a Qualcomm Snapdragon 835 platform. ASUS and Lenovo will release similar devices in the coming months. Using the ARM chips found in many flagship phones, these devices aim to power robust operating systems on 2-in-1s and laptops while providing extended battery life and always-on LTE connections.

These new devices bring ample opportunities for benchmarking. Consumers will want to know about potential trade-offs between price, power, and battery life—incentivizing tech reviewers to dive into the details and provide useful data points. But for the new Windows on ARM systems, the usual benchmarks have presented challenges. Many traditional laptop benchmarks just won’t work on the new systems. TouchXPRT, however, works like a charm.

TouchXPRT assesses performance on any Windows device. Since it’s a Universal Windows Platform (UWP) app that runs on both x86 and ARM systems, it can evaluate how well a Windows device running on ARM hardware performs compared to traditional laptops and 2-in-1s. It’s easy to install, takes about 15 minutes to run, and you can download it directly from TouchXPRT.com or install it from the Microsoft Store. Labs can also automate testing using the command line or a script.

If you’ve been looking for a Windows performance evaluation tool that’s easy to use and has the flexibility of a UWP app, give TouchXPRT a try. Read more details about TouchXPRT here, and please don’t hesitate to contact us with any questions you may have.

Justin

The XPRTs in action

In the near future, we’ll update our “XPRTs around the world” infographic, which provides a snapshot of how people are using the XPRTs worldwide. Among other stats, we include the number of XPRT web mentions, articles, and reviews that have appeared during a given period. Recently, we learned how one of those statistics—a single web site mention of WebXPRT—found its way to consumers in more places than we would have imagined.

Late last month, AnandTech published a performance comparison by Andrei Frumusanu examining the Samsung Galaxy S9’s Snapdragon 845 and Exynos 9810 variants and a number of other high-end phones. WebXPRT was one of the benchmarking tools used. The article stated that both versions of the brand-new S9 were slower than the iPhone X and, in some tests, were slower than even the iPhone 7.

A CNET video discussed the article and the role of WebXPRT in the performance comparison, and the article has been reposted to hundreds of tech media sites around the world. A quick survey shows reposts in Albania, Bulgaria, Denmark, Chile, the Czech Republic, France, Germany, Greece, Indonesia, Iran, Italy Japan, Korea, Poland, Russia, Spain, Slovakia, Turkey, and many other countries.

The popularity of the article is not surprising, for it positions the newest flagship phones from the industry’s two largest phone makers in a head-to-head comparison with a somewhat unexpected outcome. AnandTech did nothing to stir controversy or sensationalize the test results, but simply provided readers with an objective, balanced assessment of how these devices compare so that they could draw their own conclusions. The XPRTs share this approach.

We’re grateful to Andrei and others at AnandTech who’ve used the XPRTs over the years to produce content that helps consumers make informed decisions. WebXPRT is just part of AnandTech’s toolkit, but it’s one that’s accessible to anybody free of charge. With the help of BenchmarkXPRT Development Community members, we’ll continue to publish XPRT tools that help users everywhere gain valuable insight into device performance.

Justin

How to automate WebXPRT 3 testing

Yesterday, we published the WebXPRT 3 release notes, which contain instructions on how to run the benchmark, submit results, and use automation to run the tests.

Test automation is a helpful feature that lets you use scripts to run WebXPRT 3 and to control specific test parameters. Below, you’ll find a description of those parameters and instructions for utilizing automation.

Test type

The WebXPRT automation framework is designed to account for two test types: the six core workloads and any experimental workloads we might add in future builds. There are currently no experimental tests in WebXPRT 3, so the test type variable should always be set to 1.

  • Core tests: 1
  • Experimental tests: 2

 
Test scenario

This parameter lets you specify which tests to run by using the following codes:

  • Photo enhancement: 1
  • Organize album using AI: 2
  • Stock option pricing: 4
  • Encrypt notes and OCR scan: 8
  • Sales graphs: 16
  • Online homework: 32

 
To run an individual test, use its code. To run multiple tests, use the sum of their codes. For example, to run Stocks (4) and Notes (8), use the sum of 12. To run all core tests, use 63, the sum of all the individual test codes (1 + 2 + 4 + 8 + 16 + 32 = 63).

Results format

This parameter lets you select the format of the results:

  • Display the result as an HTML table: 1
  • Display the result as XML: 2
  • Display the result as CSV: 3
  • Download the result as CSV: 4

 
To use the automation feature, start with the URL http://www.principledtechnologies.com/benchmarkxprt/webxprt/2018/3_v5/, append a question mark (?), and add the parameters and values separated by ampersands (&). For example, to run all the core tests and download the results, you would use the following URL: http://principledtechnologies.com/benchmarkxprt/webxprt/2018/3_v5/auto.php?testtype=1&tests=63&result=4

We hope WebXPRT’s automation features will make testing easier for you. If you have any questions about WebXPRT or the automation process, please feel free to ask!

Justin

MWC18 and technology on the brink

This year’s Mobile World Congress in Barcelona bristled with technologies on the brink of superstardom.  The long-awaited 5G high-speed mobile standard again dominated the conversations, and is one year closer to creating a world of high-speed connections that will make possible mobile usages we’ve only begun to discover.  Intelligent, connected cars promise a self-driving and highly interconnected automotive experience that should ultimately make driving better for all of us.  Artificial intelligence, already a star, showed glimmers of its vast and still barely tapped potential.  In keeping with the show’s name, mobile devices of all sorts proved that phones and tablets and laptops are nowhere near done, with new models and capabilities available all over the many halls that comprised the MWC campus.

Each of those technologies will continue to evolve rapidly over the coming years, and each will create new opportunities for us all to benefit.  Those opportunities will appear both in ways we understand now—faster connections and quicker devices, for example—and in fashions we don’t yet understand.  The new benefits will lead to new usage models, change the ways we interact with the world, and create whole new markets.  (When the first smartphones appeared, they changed photography forever, but that wasn’t their primary goal.)  These new technologies will help us in ways we can now only glimpse.

These changes and new capabilities will breed both competition and, inevitably, confusion.  How are we to know which of the new products deliver the best implementations of these technologies heading toward stardom, and how are we to know when to upgrade to new generations of these products?

Answering those questions, and clarifying some of the confusing aspects of the always shifting tech market, are the reasons the XPRT community and tools exist.  New tech creates new usage models that require new tools to assess–XPRT tools.

If there’s one last lesson I learned from MWC18, it’s that our work is only just beginning.  The new technologies that are on the brink today will become superstars soon, and we’ll be there with the tools you need to assess and compare them.

Mark

Check out the other XPRTs:

Forgot your password?