BenchmarkXPRT Blog banner

Category: History of benchmarking

Experience is the best teacher

One of the core principles that guides the design of the XPRT tools is they should reflect the way real-world users use their devices. The XPRTs try to use applications and workloads that reflect what users do and the way that real applications function. How did we learn how important this is? The hard way—by making mistakes! Here’s one example.

In the 1990s, I was Director of Testing for the Ziff-Davis Benchmark Operation (ZDBOp). The benchmarks ZDBOp created for its technical magazines became the industry standards, because of both their quality and Ziff-Davis’ leadership in the technical trade press.

WebBench, one of the benchmarks ZDBOp developed, measured the performance of early web servers. We worked hard to create a tool that used physical clients and tested web server performance over an actual network. However, we didn’t pay enough attention to how clients actually interacted with the servers. In the first version of WebBench, the clients opened connections to the server, did a small amount of work, closed the connections, and then opened new ones.

When we met with vendors after the release of WebBench, they begged us to change the model. At that time, browsers opened relatively long-lived connections and did lots of work before closing them. Our model was almost the opposite of that. It put vendors in the position of having to choose between coding to give their users good performance and coding to get good WebBench results.

Of course, we were horrified by this, and worked hard to make the next version of the benchmark reflect more closely the way real browsers interacted with web servers. Subsequent versions of WebBench were much better received.

This is one of the roots from which the XPRT philosophy grew. We have tried to learn and grow from the mistakes we’ve made. We’d love to hear about any of your experiences with performance tools so we can all learn together.

Eric

Another great year

A lot of great stuff happened this year! In addition to releasing new versions of the benchmarks, videos, infographics, and white papers, we released our first-ever German UI and sponsored our first student partnership at North Carolina State University. We visited three continents to promote the XPRTs and saw XPRT results published in six of them (we’re still working on Antarctica).

Perhaps most exciting, we reached our fifth anniversary. Users have downloaded or run the XPRTs over 100,000 times.

As great as the year has been, we are sprinting into 2016. Though I can’t talk about them yet, there are some big pieces of news coming soon. Even sooner, I will be at CES next week. If you would like to talk about the XPRTs or the future of benchmarking, let me know and we’ll find a time to meet.

Whatever your holiday traditions are, I hope you are having a great holiday season. Here’s wishing you all the best in 2016!

Eric

Chaos and opportunity

With both E3 and Apple’s WWDC happening this week, there’s been a lot of news. There’s also been a lot of hyperbolic commentary. I am not about to get into the arguments about the PS4 vs. the Xbox One or iOS 7 vs. Android.

It was Tim Cook’s presentation at WWDC that really got my attention. It’s unusual in an executive presentation to focus so much attention on a particular competitor, but Android was clearly on his mind. At one point, he focused harsh attention on fragmentation in the Android market, calling it “terrible” for developers. You can see the video here, at about 74 minutes.

As we saw in the 90s, chaos can breed innovation. At that time, the paradigm was that Macs always worked, but if you wanted the most advanced hardware, you should get a PC. I remember the editors at MacWorld, who deeply, truly loved the Mac, lusting over the (by the standards of the time) small, light, cheap notebooks PC users could get.

That being said, we understand the challenges of developing in the Android market. As I said in It’s finally here!, the Android ecosystem is sufficiently diverse that we know the benchmark will encounter configurations we’ve not seen before. If you have any problems with the MobileXPRT CP, please let us know at benchmarkxprtsupport@principledtechnologies.com. We want the benchmark to be the best it can be.

Eric

Comment on this post in the forums

History in the making

We are quickly approaching the debut of HDXPRT 2012. It will be the second version of HDXPRT developed under the benchmark development community paradigm. This milestone provides a nice opportunity to look back at what has happened over the nearly two years since we started creating community-based benchmarks.

The most obvious accomplishment is the development of HDXPRT 2011 and HDXPRT 2012. HDXPRT 2011 has been used around the world for evaluating the performance of computers using applications doing activities that consumers do to create and consume content. We are hopeful that HDXPRT 2012 will be even more widely used.

We also announced earlier this year a new benchmark, TouchXPRT. This benchmark will provide a way to evaluate the performance of the emerging touch-based devices, including tablets. TouchXPRT will debut later this year, initially on Windows 8 Metro.

We have been working hard to get the word out about the benchmarks. We’ve been writing this weekly blog, conducting Webinars, and generally talking with folks in the computer industry. We’ve visited with members of the community around the world at trade shows like CES in Las Vegas and Computex in Taipei. We also spent time with members of the press and computer hardware and software developers. Over the coming months, we are planning to revamp the Web site, add video content, and generally find ways to better engage with and extend the development community.

Less obvious, but equally important to me, has been the development of the development community itself. Developing benchmarks has not been done this way before. We are doing what we can to make the process open to the community, including releasing the benchmark source code. We are optimistic that this method will grow and be a real asset for the industry.

As we look at the growing family of benchmarks under the benchmark XPRT umbrella, the question is always what is next? How can we improve the products and the community? What performance areas do we need to look at in the future? Battery life? Macs? Phones?

Thanks so much for joining us on this journey. The members of this community are what make it work. We look forward to continuing the journey with you!

Bill

Comment on this post in the forums

Long-lasting benchmarks

While researching the Top500 list for last week’s blog, I ran across an interesting article (http://bits.blogs.nytimes.com/2011/05/09/the-ipad-in-your-hand-as-fast-as-a-supercomputer-of-yore/?ref=technology).  Its basic premise is that the iPad 2 has about the same computing power as the Cray 2 supercomputer, the world’s fastest computer in 1985.  I’m old enough to remember the Cray 1 and Cray 2 supercomputers with their unique circular shapes.  In their day, they were very expensive and, consequently, rare.  Only government agencies could afford to buy them.  Just getting to see one was a big deal.  In stark contrast, I seem to see iPads everywhere.

What was the benchmark for determining this?  It was LINPACK, the same benchmark that determined the winner of the Top500 earlier in June.  Based on the LINPACK results, I am holding in my hand a device that could rival the most powerful in the world about 25 years ago.  Another perspective is that I have a phone faster than the most powerful computer in the world the year I graduated with my CS degree.  And, I use it to play Angry Birds…   (Picture trying to convince someone in the 80s that one day millions of hand-held Cray 2 supercomputers would be used to catapult exploding birds at annoying oinking pigs.)

One interesting thought from all of this is the power of benchmarks that last over time.  While it will be a rare (and rather limited) benchmark that can last as long as LINPACK, it is important for benchmarks to not change too frequently.  On the other side of the scale is the desire for a benchmark to keep up with current technology.  With HDXPRT, we are aiming for about a year between versions.  I’d love to know whether you think that is too long, too short, or about right.

Bill

Comment on this post in the forums

Putting HDXPRT in some benchmark context

Benchmarks come in many shapes and sizes.  Some are extremely small, simple, and focused, while others are large, complex, and cover many aspects of a system.  To help position HDXPRT in the world of benchmarks, let me share with you a little taxonomy that Bill and I have long used.  No taxonomy is perfect, of course, but we’ve found this one to be very helpful as a general categorization tool.

From the perspective of how benchmarks measure performance, you can divide most of them into three groups.

Inspection tools use highly specialized tests to target very particular parts of a system. Back in the day, lo these many decades ago—okay, it was only two decades, but in dog years two tech decades is like five generations—some groups used a simple no-op loop to measure processor performance. I know, it sounds dumb today, but for a short time many felt it was a legitimate measure of processor clock speed, which is one aspect of performance. Similarly, if you want to know how fast a graphics subsystem could draw a particular kind of line, you could write code to draw lines of that type over and over.

These tools have very limited utility, because they don’t do what real users do, but for people working close to hardware, they can be useful.

Moving closer to the real world, synthetic benchmarks are specially written programs that simulate the kinds of work their developers believe real users are doing. So, if you think your target users are spending all day in email, you could write your own mini email client and time functions in it.  These tools definitely move closer to real user work than inspection tools, but they still have the drawback of not actually running the programs real people are using.

Application-based benchmarks take that last step by using real applications, the same programs that users employ in the real world. These benchmarks cause those applications to perform the kinds of actions that real users take, and they time those actions.  You can always argue about how representative they are—more on that in a future blog entry, assuming I don’t forget to write it—but they are definitely closer to the real world because they’re using real applications.

With all of that background, HDXPRT becomes easy to classify:  it’s an application-based benchmark.

Mark Van Name

Comment on this post in the forums

Check out the other XPRTs:

Forgot your password?