Category: What makes a good benchmark?

Creating a machine-learning benchmark

on December 22, 2016

Recently, we wrote about one of the most exciting emerging technology areas, machine learning, and the question of what role the XPRTs could play in the field.

Experts expect machine learning to be the analytics backbone of the IoT data explosion. It is a disruptive technology with potential to influence a broad range of industries. Consumer and industrial applications that take advantage of machine-learning advancements in computer vision, natural language processing, and data analytics are already available and many more are on the way.

Currently, there is no comprehensive machine-learning or deep-learning benchmark that includes home, automotive, industrial, and retail use cases. The challenge with developing a benchmark for machine learning is that these are still the early days of the technology. A fragmented software and hardware landscape and lack of standardized implementations makes benchmarking machine learning complex and challenging.

Based on the conversations we’ve had over the last few weeks, we’ve decided to take on that challenge. With the community’s help, of course!

As we outlined in a blog entry last month, we will work with interested folks in the community, key vendors, and academia to pull together what we are internally calling MLXPRT.

While the result may differ substantially from the existing XPRTs, we think the need for something is great. Whether that will turn out to be a packaged tool or just sample code and workloads remains to be seen.

What we need most your help. We need both general input about what you would like to see as well as any expertise you may have. Let us know any questions you may have or ways you can help.

On a related note, I’ll be at CES 2017 in Las Vegas during the first week of January. I’d love to meet and talk more about machine learning, benchmarking, or the XPRTs. If you’re planning to be there and would like to connect, let us know.

We will not have a blog entry next week over the holidays, so we wish all of you a wonderful time with your families and a great start to the new year.

Bill

Posted in BenchmarkXPRT, BenchmarkXPRT development community, CES, Collaborative benchmark development, Machine learning, What makes a good benchmark? |

HDXPRT’s future

By Bill Catchings

on November 3, 2016

While industry pundits have written many words about the death of the PC, Windows PCs are going through a renaissance. No longer do you just choose between a desktop or a laptop in beige or black. There has been an explosion of choices.

Whether you want a super-thin notebook, a tablet, or a two-in-one device, the market has something to offer. Desktop systems can be small devices on your desk, all-in-ones with the PC built into the monitor, or old-style boxes that sit on the floor. You can go with something inexpensive that will be sufficient for many tasks or invest in a super-powerful PC capable of driving today’s latest VR devices. Or you can get a new Microsoft Surface Studio, an example of the new types of devices entering the PC scene.

The current proliferation of PC choices means that tools that help buyers understand the performance differences between systems are more important than they have been in years. Because HDXPRT is one such tool, we expect demand for it to increase.

We have many tasks ahead of us as we prepare for this increased demand. The first is to release a version of HDXPRT 2014 that doesn’t require a patch. We are working on that and should have something ready later this month.

For the other tasks, we need your input. We believe we need to update HDXPRT to reflect the world of high-definition content. It’s tempting to simply change the name to UHDXPRT, but this was our first XPRT and I’m partial to the original name. How about you?

As far as tests, what should a 2017 version of HDXPRT include? We think 4K-related workloads are a must, but aren’t sure whether 4K playback tests are the way to go. What do you think? We need to update other content, such as photo and video resolutions, and replace outdated applications with current versions. Would a VR test would be worthwhile?

Please share your thoughts with us over the coming weeks as we put together a plan for the next version of HDXPRT!

Bill

Posted in Application-based benchmarks, BenchmarkXPRT development community, Collaborative benchmark development, Future of performance evaluation, HDXPRT, HDXPRT development process, Performance benchmarking, What makes a good benchmark?, Windows 10 |

An exciting milestone for WebXPRT!

By Justin Greene

on October 5, 2016

If you’re familiar with the run counter on WebXPRT.com, you may have noticed that WebXPRT recently passed a pretty significant milestone. Since we released WebXPRT 2013, users running WebXPRT 2013 and 2015 have successfully completed over 100,000 runs!

We’re thrilled about WebXPRT’s ongoing popularity, and we think that it’s due to the benchmark’s unique combination of characteristics: it’s easy to run, it runs quickly and on a wide variety of platforms, and it evaluates device performance using real-world tasks. Manufacturers, developers, consumers, and media outlets in more than 358 cities, from Aberdeen to Zevenaar, and 57 countries, from Argentina to Vietnam, have used WebXPRT’s easy-to-understand results to compare how well devices handle everyday tasks. WebXPRT has definitely earned its reputation as a “go-to” benchmark.

If you haven’t run WebXPRT yet, give it a try. The test is free and runs in almost any browser.

We’re grateful for everyone who’s helped us reach this milestone. Here’s to another 100,000 runs!

Justin

Posted in BenchmarkXPRT development community, WebXPRT, WebXPRT 2013, WebXPRT 2015, What makes a good benchmark? |

Rebalancing our portfolio

By Bill Catchings

on September 29, 2016

We’ve written recently about the many new ways people are using their devices, the growing breadth of types of devices, and how application environments also are changing. We’ve been thinking a lot about the ways benchmarks need to adapt and what new tests we should be developing.

As part of this process, we’re reviewing the XPRT portfolio. An example we wrote about recently was Google’s statement that they are bringing Android apps to Chrome OS and moving away from Chrome apps. Assuming the plan comes to fruition, it has big implications for CrXPRT, and possibly for WebXPRT as well. Another example is that once upon a time, HDXPRT included video playback tests. The increasing importance of 4K video might mean we should bring them back.

As always, we’re interested in your thoughts. Which tests do you see as the most useful going forward? Which ones do you think might be past their prime? What new areas do you like to see us start to address? Let us know!

Over the coming weeks, we’ll share our conclusions based on these market forces and your feedback. We’re excited about the possibilities and hope you are as well.

Bill

Posted in Android, Application-based benchmarks, BenchmarkXPRT development community, Chrome OS, Collaborative benchmark development, CrXPRT, Future of performance evaluation, HDXPRT, WebXPRT, What makes a good benchmark? |

Doing things a little differently

By Eric Hale

on September 8, 2016

I enjoyed watching the Apple Event live yesterday. There were some very impressive announcements. (And a few which were not so impressive – the Breathe app would get on my nerves really fast!)

One thing that I was very impressed by was the ability of the iPhone 7 Plus camera to create depth-of-field effects. Some of the photos demonstrated how the phone used machine learning to identify people in the shot and keep them in focus while blurring the background, creating a shallow depth of field. This causes the subjects in a photo to really stand out. The way we take photos is not the only thing that’s changing. There was a mention of machine learning being part of Apple’s QuickType keyboard, to help with “contextual prediction.”

This is only one product announcement, but it’s a reminder that we need to be constantly examining every part of the XPRTs. Recently, we talked a bit about how people will be using their devices in new ways in the coming months, and we need to be developing tests for these new applications. However, we must also stay focused on keeping existing tests fresh. People will keep taking photos, but today’s photo editing tests may not be relevant a year or two from now.

Were there any announcements yesterday that got you excited? Let us know!

Eric

Posted in Apple, Benchmark metrics, Benchmarking, Future of performance evaluation, Machine learning, Phones, What makes a good benchmark? |

Apples to apples?

By Eric Hale

on August 25, 2016

PCMag published a great review of the Opera browser this week. In addition to looking at the many features Opera offers, the review included performance data from multiple benchmarks, which look at areas such as hardware graphics acceleration, WebGL performance, memory consumption, and battery life.

Three of the benchmarks have a significant, though not exclusive, focus on JavaScript performance: Google Octane 2.0, JetStream 1.1, and WebXPRT 2015. The three benchmarks did not rank the browsers the same way, and in the past, we‘ve discussed some of the reasons why this happens. In addition to the difference in tests, there are also sometimes differences in approaches that are worth considering.

For example, consider the test descriptions for JetStream 1.1. You’ll immediately notice that the tests are much lower-level tests than the ones in WebXPRT. However, consider these phrases from a few of the test descriptions:

code-first-load “…This test attempts to defeat the browser’s caching capabilities…”
splay-latency “Tests the worst-case performance…”
zlib “…modified to restrict code caching opportunities…”

While the XPRTs test typical performance for higher level applications, the tests in JetStream are tweaked to stress devices in very specific ways, some of which are not typical. The information these tests provide can be very useful for engineers and developers, but may not be as meaningful to the typical user.

I have to stress that both approaches are valid, but they are doing somewhat different things. There’s a cliché about comparing apples to apples, but not all apples are the same. If you’re making a pie, a Granny Smith would be a good choice, but for snacking, you might be better off with a Red Delicious. Knowing a benchmark’s purpose will help you find the results that are most meaningful to you.

Eric

Posted in Benchmark metrics, Browser-based benchmarks, JavaScript, Performance benchmarking, WebGL, WebXPRT, What makes a good benchmark? |

Category: What makes a good benchmark?

Creating a machine-learning benchmark

HDXPRT’s future

An exciting milestone for WebXPRT!

Rebalancing our portfolio

Doing things a little differently

Apples to apples?

Check out the other XPRTs: