BenchmarkXPRT Blog banner

Author Archives: Bill Catchings

Long-lasting benchmarks

While researching the Top500 list for last week’s blog, I ran across an interesting article (http://bits.blogs.nytimes.com/2011/05/09/the-ipad-in-your-hand-as-fast-as-a-supercomputer-of-yore/?ref=technology).  Its basic premise is that the iPad 2 has about the same computing power as the Cray 2 supercomputer, the world’s fastest computer in 1985.  I’m old enough to remember the Cray 1 and Cray 2 supercomputers with their unique circular shapes.  In their day, they were very expensive and, consequently, rare.  Only government agencies could afford to buy them.  Just getting to see one was a big deal.  In stark contrast, I seem to see iPads everywhere.

What was the benchmark for determining this?  It was LINPACK, the same benchmark that determined the winner of the Top500 earlier in June.  Based on the LINPACK results, I am holding in my hand a device that could rival the most powerful in the world about 25 years ago.  Another perspective is that I have a phone faster than the most powerful computer in the world the year I graduated with my CS degree.  And, I use it to play Angry Birds…   (Picture trying to convince someone in the 80s that one day millions of hand-held Cray 2 supercomputers would be used to catapult exploding birds at annoying oinking pigs.)

One interesting thought from all of this is the power of benchmarks that last over time.  While it will be a rare (and rather limited) benchmark that can last as long as LINPACK, it is important for benchmarks to not change too frequently.  On the other side of the scale is the desire for a benchmark to keep up with current technology.  With HDXPRT, we are aiming for about a year between versions.  I’d love to know whether you think that is too long, too short, or about right.

Bill

Comment on this post in the forums

Petaflops?

I saw an article earlier this week about Japan’s K Computer, the latest computer to be designated the “fastest supercomputer” in the world.  Twice a year (June and November), the Top500 list comes out.  The list’s publishers consider the highest scoring computer on the list as the fastest computer in the world.  The first article I read about the recent rankings did not cite the results, just the rankings.  So, I went to another article which referred to the K computer as capable of 8.2 quadrillion calculations per second, but did not give the results of the other leading supercomputers.  On to the next article which said the K Computer was capable of 1.2 petaflops per second.  (The phrase petaflops per second is in the same category as ATM machine or PIN number…)  The same article said that the third fastest was able to get 1.75 petaflops per second.  OK, now I was definitely confused.  (I really miss the old days of good copy editing and fact checking, but that is a blog for another day.)

So, I went to the source, the Top500 Web site (www.top500.org).  It confirmed that the K Computer obtained 8.16 petaflops (or quadrillion calculations per second) on the LINPACK test.  The Chinese Tianhe-1A got 2.56 petaflops and the American Jaguar, 1.76 petaflops.

Once I got over the sloppy reporting and stopped playing with the graphs of the trends and scores over time, I started thinking about the problem of metrics and the importance of making them easy to understand.  Some metrics are very easy to report and understand.  For example, a battery life benchmark reports its results in hours and minutes.  We all know what this means and we know that more hours and minutes is a good thing.  Understanding what petaflops are is decidedly harder.

Another issue is the desire for bigger numbers to mean better results.  The time to finish a task is fairly easy to understand, but in that case, less time is better.  One technique for dealing with this issue is to normalize the numbers.  Basically, that means to divide the result (such as a time) by the result of a baseline system’s result.  The baseline system’s result is typically considered to be 1.0 (or some other number like 10 or 100) and other results are meaningful only in relation to the baseline system or each other.  A system scoring 2.0 runs twice as fast as the baseline system’s 1.0.  While that is clear, it does take more explanation than just seconds.

Finding the right metrics was a challenge we faced with HDXPRT 2011. Do you think we got it right? Please let us know what you think.

Bill

Comment on this post in the forums

Knowing when to wait

Mark mentioned in his blog entry a few weeks ago that waiting sucks.  I think we can all agree with that sentiment.  However, an experience I had while in Taipei for Computex made me reevaluate that thinking a bit.  

I went jogging one morning in a park near my hotel.  It was a relatively small park, just a quarter mile around the pond that took up most of the park.  I was one of only a couple people jogging, but the park was full of people.  Some were walking around the pond.  There also were groups of people doing some form of Tai Chi in various clearings around the pond.  The path I was on was narrow.  At times, there was no way of getting around the people walking without running into the ones doing Tai Chi.  That in turn meant running in place at times.  Or, put another way, waiting.  

Everyone was polite at the encounters, but the contrast between me jogging and the folks doing Tai Chi was stark.  I wanted to run my miles as quickly as possible.  Those doing Tai Chi were decidedly not in a rush.  They were doing their exercises together with others.  The goal was to do them at the proper pace in the proper way.  

That got me to thinking about waiting on my computer.  (Hey, time to think is one of the main reasons I exercise!)  There are times when waiting for a computer infuriates me.  Other times, however, the computer is fast enough.  Or even too fast, like when I’m trying to scroll down to the right cell in Excel and it jumps down to a whole screen full of empty cells.  This phenomenon, of course, relates to benchmarks.  Benchmarks should measure those operations that are slow enough to hurt productivity or are downright annoying.  There is less value in measuring operations that users don’t have to wait on. 

Have you had any thoughts about what makes a good benchmark?  Even if you weren’t exercising when you had the thought, please share it with the community. 

Bill

Comment on this post in the forums

Home sweet home

After a long set of flights back from Computex in Taipei, I’m finally home in North Carolina. Unfortunately, I’m still not quite sure what time zone I’m in!

While awake in the middle of the night, I’ve been thinking about some of the things I saw at Computex.  While I was there, it seemed like a jumble of notebooks, power supplies, gaming rigs, motherboards, cases, Hello Kitty accessories, and some things that I still don’t quite know what they were.   Many of the things I saw were not brand new, but it was my first chance to see them up close.  Some of them were of technologies still on the horizon like Intel’s Ultrabook concept and Microsoft’s Windows 8.  I also saw all sorts of combinations of phones, 4G, and other devices.

One thing that stood out to me were the number and variety of tablets.  They were in a variety of sizes (and screen resolutions).  There were quite a few vendors and some were ones I would not have suspected but was pleasantly surprised to encounter, like Viewsonic and Shuttle.  The OS choices included Android, WebOS, and MeeGo.   ASUS had a couple of interesting hybrid approaches such as the Eee Pad Transformer and the Padfone.  The former is a 10.1-inch tablet that plugs into a keyboard.  The Padfone is a smartphone that can plug into the back of a larger (10.1-inch) touch screen to act as a tablet.

All of these tablet choices, as well as the iPad that they all must be compete against, left me wondering how to choose between them.  Some part of the choice comes down to the size and features.  As always, however, performance plays a key role.  My tolerance for waiting on a tablet device is even lower than it is for waiting on my PC.  The problem is how to make valid comparisons across such a wide range of platforms.  I’d love to hear from you what you think about performance testing on tablets.  Is it useful?  What are the best ways to accomplish it?

Finally, thanks to all the folks who came by and visited our suite at Computex.  I enjoyed getting the chance to meet some of the members of the HDXPRT Development Community.  And, hopefully, I convinced more folks to join.

Bill

Comment on this post in the forums

Computex – Taipei

It’s hot and muggy here in Taipei. Just like home in North Carolina!

Weather aside, Taipei is definitely not Raleigh. Taipei is a big city with tall buildings. Right next to the hotel is the Taipei 101 which was the world’s tallest building for a few years. The streets are full of cars and motor scooters. People here walk quickly and purposefully. All of Computex seems to be filled with similar purpose and drive. It reminds me a quite bit of COMDEX in Vegas in its prime. Technology has taken over a city only too glad to embrace that technology. In next week’s blog, I’ll let you know about some of the cool things showing here.

I’ve had some interesting HDXPRT meetings so far. One of them helped me to remember some of the non-technical challenges of a successful benchmark. We’ve mentioned benchmark challenges like reliability (it needs to run when you need it to run) and repeatability (it needs to give similar results—within a few percent—each time you run it). I discussed with folks from one PC performance Web site the importance of a benchmark having some permanence. If the benchmark changes too frequently, you can’t compare the current product with the one you reviewed a couple months ago. With HDXPRT, our goal is an annual cycle. That should allow for comparing to older results while still keeping the benchmark current.

Any folks who may be here in Taipei for Computex, please come on by the Hyatt. We can talk about HDXPRT, benchmarks in general, or what you would most like to see in the future of performance evaluation. If nothing else, come by and escape the humidity! Drop us an email at hdxprt_computex@principledtechnologies.com and set up a time to come on over.

Bill

Comment on this post in the forums

Our community’s goal

Computer system performance evaluation has a long and complex history. Many of the earliest tests were simple, short code snippets, such as Whetstone, that did little more than give an indication of how fast a particular computer subsystem was able to operate. Unfortunately, such simple benchmarks quickly lost their value, in part because they were very crude measures, and in part because software tools on the things they were measuring could easily optimize for them. In some cases, a compiler could even recognize a test and “optimize” the code by simply producing the final result!

Over time, though, benchmarks have become more complex and more relevant. Whole organizations exist and have existed to build benchmarks. Notable ones include the Ziff-Davis Benchmark Operation (ZDBOp), which the Ziff-Davis computer magazines funded in the 1990s and which Mark and I ran; the Standard Performance Evaluation Corporation (SPEC), which its member companies fund and of which PT is a member; and the Business Applications Performance Corporation (BAPCo), which its member companies fund. Each of these organizations has developed widely used products, such as Winstone (ZDBOp), SPEC CPU (SPEC), and SYSmark (BAPCo). Each organization has also always faced challenges. In the case of ZDBOp, for example, Ziff Davis could no longer support the costs of developing its benchmarks, so they discontinued the group. SPEC continues to develop good benchmarks, but its process can sometimes yield years between versions.

The goal with HDXPRT and the HDXPRT Development Community (HDC) is to explore a new way to develop benchmarks. By utilizing the expertise and experience of a community of interested people, we hope to be able develop benchmarks in an open and collaborative environment while keeping them timely.

HDXPRT 2011 is the first test of this approach. We believe that it and subsequent versions of it, as well as other benchmarks, will give the industry a new model for creating world-class performance measurement tools.

If you’re not a member of the HDC, please consider joining us and helping define the future of performance evaluation.

Bill

Comment on this post in the forums

Check out the other XPRTs:

Forgot your password?