Category: Collaborative benchmark development

The WebXPRT 3 Community Preview is here!

on December 14, 2017

Today we’re releasing the WebXPRT 3 Community Preview (CP). As we discussed in the blog last month, in the new version of WebXPRT, we updated the photo-related workloads with new images and a new deep learning task for the Organize Album workload. We also added an optical character recognition task to the Local Notes workload and combined a portion of the DNA Sequence Analysis scenario with a writing sample/spell check scenario to simulate an online homework hub in the new “Online Homework” workload.

Also, longtime WebXPRT users will immediately notice a completely new, but clean and straightforward, UI. We’re still tweaking aspects of the UI and implementing full functionality for certain features such as social media sharing and German language translation, but we don’t anticipate making any significant changes to the overall test or individual workloads before the general release.

As with all community previews, the WebXPRT 3 CP is available only to BenchmarkXPRT Development Community members, who can access the link from the WebXPRT tab in the Members’ Area.

After you try the WebXPRT 3 CP, please send us your comments. Thanks and happy testing!

Justin

Posted in BenchmarkXPRT development community, Browser-based benchmarks, Collaborative benchmark development, Community Preview, Cross-platform benchmarks, Future of performance evaluation, German, Performance benchmarking, WebXPRT, WebXPRT 3 |

Nothing to hide

By Eric Hale

on November 9, 2017

I recently saw an article in ZDNet by my old friend Steven J. Vaughan-Nichols that talks about how NetMarketShare and StatCounter reported a significant jump in the operating system market shares for Linux and Chrome OS. One frustration Vaughan-Nichols alluded to in the article is the lack of transparency into how these firms calculated market share, so he can’t gauge how reliable they are. Because neither NetMarketShare nor StatCounter disclosed their methods, there’s no sure way for interested observers to verify the numbers. Steven prefers the data from the federal government’s Digital Analytics Program (DAP). DAP makes its data freely available, so you can run your own calculations. Transparency generates trust.

Transparency is a core value for the XPRTs. We’ve written before about how statistics can be misleading. That’s why we’ve always disclosed exactly how the XPRTs calculate performance results, and the way BatteryXPRT calculates battery life. It’s also why we make each XPRT’s source code available to community members. We want to be open and honest about how we do things, and our open development community model fosters the kind of constructive feedback that helps to continually improve the XPRTs.

We’d love for you to be a part of that process, so if you have questions or suggestions for improvement, let us know. If you’d like to gain access to XPRT source code and previews of upcoming benchmarks, today is a great day to join the community!

Eric

Posted in Battery life, BatteryXPRT 2014 for Android, BenchmarkXPRT development community, Chrome OS, Collaborative benchmark development, What makes a good benchmark? |

Machine learning performance tool update

By Bill Catchings

on October 19, 2017

Earlier this year we started talking about our efforts to develop a tool to help in evaluating machine learning performance. We’ve given some updates since then, but we’ve also gotten some questions, so I thought I’d do my best to summarize our answers for everyone.

Some have asked what kinds of algorithms we’ve been looking into. As we said in an earlier blog, we’re looking at algorithms involved in computer vision, natural language processing, and data analytics, particularly different aspects of computer vision.

One seemingly trivial question we’ve received regards the proposed name, MLXPRT. We have been thinking of this tool as evaluating machine learning performance, but folks have raised a valid concern that it may well be broader than that. Does machine learning include deep learning? What about other artificial intelligence approaches? I’ve certainly seen other approaches lumped into machine learning, probably because machine learning is the hot topic of the moment. It feels like everything is boasting, “Now with machine learning!”

While there is some value in being part of such a hot movement, we’ve begun to wonder if a more inclusive name, such as AIXPRT, would be better. We’d love to hear your thoughts on that.

We’ve also had questions about the kind of devices the tool will run on. The short answer is that we’re concentrating on edge devices. While there is a need for server AI/ML tools, we’ve been focusing on the evaluating the devices close to the end users. As a result, we’re looking at the inference aspect of machine learning rather than the training aspect.

Probably the most frequent thing we’ve been asked about is the timetable. While we’d hoped to have something available this year, we were overly optimistic. We’re currently working on a more detailed proposal of what the tool will be, and we aim to make that available by the end of this year. If we achieve that goal, our next one will be to have a preliminary version of the tool itself ready in the first half of 2018.

As always, we seek input from folks, like yourself, who are working in these areas. What would you most like to see in an AI/machine learning performance tool? Do you have any questions?

Bill

Posted in AI, Benchmark metrics, Collaborative benchmark development, computer vision, Future of performance evaluation, Machine learning, What makes a good benchmark? |

Everything old is new again

By Eric Hale

on September 28, 2017

I recently saw an article called “4 lessons for modern software developers from 1970s mainframe programming.” This caught my eye because I started programming in the late 1970s, and my first programming environment was an IBM 370.

The author talks about how, back in the old days, you had to write tight code because memory and computing resources were limited. He also talks about the great amount of time we spent planning, writing, proofreading, and revising our code—all on paper—before running it. We did that because computing resources were expensive and you would get in trouble for using too many. He’s right about that—I got reamed out a couple of times!

At first, it seemed like this was just another article by an old programmer talking about how sloppy and lazy the new generation is, but then he made an interesting point. Programming for embedded processors reintroduces the types of resource limitations we used to have to deal with. Cloud computing reintroduces having to pay for computing resources based on usage.

I personally think he goes too far in making his point – there are a lot times when rapid prototyping and iterative development are the best way to do things. However, his main thesis has merit. Some new applications may benefit from doing things the old way.

Cloud computing and embedded processors are, of course, important in machine learning applications. As we’re working on a machine learning XPRT, we’ll be following best practices for this new environment!

Eric

Posted in BenchmarkXPRT development community, Collaborative benchmark development, Future of performance evaluation, Machine learning |

Decisions, decisions

By Justin Greene

on September 14, 2017

Back in April, we shared some of our initial ideas for a new version of WebXPRT, and work on the new benchmark is underway. Any time we begin the process of updating one of the XPRT benchmarks, one of the first decisions we face is how to improve workload content so it better reflects the types of technology average consumers use every day. Since benchmarks typically have a life cycle of two to four years, we want the benchmark to be relevant for at least the next couple of years.

For example, WebXPRT contains two photo-related workloads, Photo Effects and Organize Album. Photo Effects applies a series of effects to a set of photos, and Organize Album uses facial recognition technology to analyze a set of photos. In both cases, we want to use photos that represent the most relevant combination of image size, resolution, and data footprint possible. Ideally, the resulting image sizes and resolutions should differentiate processing speed on the latest systems, but not at the expense of being able to run reasonably on most current devices. We also have to confirm that the photos aren’t so large as to impact page load times unnecessarily.

The way this strategy works in practice is that we spend time researching hardware and operating system market share. Given that phones are the cameras that most people use, we look at them to help define photo characteristics. In 2017, the most widespread mobile OS is Android, and while reports vary depending on the metric used, the Samsung Galaxy S5 and Galaxy S7 are at or near the top of global mobile market share. For our purposes, the data tells us that choosing photo sizes and resolutions that mirror those of the Galaxy line is a good start, and a good chunk of Android users are either already using S7-generation technology, or will be shifting to new phones with that technology in the coming year. So, for the next version of WebXPRT, we’ll likely use photos that represent the real-life environment of an S7 user.

I hope that provides a brief glimpse into the strategies we use to evaluate workload content in the XPRT benchmarks. Of course, since the BenchmarkXPRT Development Community is an open development community, we’d love to hear your comments or suggestions!

Justin

Posted in Android, Benchmark metrics, Benchmarking, BenchmarkXPRT development community, Collaborative benchmark development, Performance benchmarking, WebXPRT, WebXPRT 2017, What makes a good benchmark? |

Planning the next version of HDXPRT

By Justin Greene

on July 20, 2017

A few weeks ago, we wrote about the capabilities and benefits of HDXPRT. This week, we want to share some initial ideas for the next version of HDXPRT, and invite you to send us any comments or suggestions you may have.

The first step towards a new HDXPRT will be updating the benchmark’s workloads to increase their value in the years to come. Primarily, this will involve updating application content, such as photos and videos, to more contemporary file resolutions and sizes. We think 4K-related workloads will increase the benchmark’s relevance, but aren’t sure whether 4K playback tests are necessary. What do you think?

The next step will be to update versions of the real-world trial applications included in the benchmark, including Adobe Photoshop Elements, Apple iTunes, Audacity, CyberLink MediaEspresso, and HandBrake. Are there other any applications you feel would be a good addition to HDXPRT’s editing photos, editing music, or converting videos test scenarios?

We’re also planning to update the UI to improve the look and feel of the benchmark and simplify navigation and functionality.

Last but not least, we’ll work to fix known problems, such as the hardware acceleration settings issue in MediaEspresso, and eliminate the need for workarounds when running HDXPRT on the Windows 10 Creators Update.

Do you have feedback on these ideas or suggestions for applications or test scenarios that we should consider for HDXPRT? Are there existing features we should remove? Are there elements of the UI that you find especially useful or would like to see improved? Please let us know. We want to hear from you and make sure that HDXPRT continues to meet your needs.

Justin

Posted in 4K, BenchmarkXPRT, Collaborative benchmark development, Future of performance evaluation, HDXPRT, HDXPRT capabilities, HDXPRT development process, HDXPRT release cycle, Let us know your thoughts, Performance benchmarking, What makes a good benchmark?, Windows 10 |

Category: Collaborative benchmark development

The WebXPRT 3 Community Preview is here!

Nothing to hide

Machine learning performance tool update

Everything old is new again

Decisions, decisions

Planning the next version of HDXPRT

Check out the other XPRTs: