BenchmarkXPRT Blog banner

Category: What makes a good benchmark?

Apples to apples?

PCMag published a great review of the Opera browser this week. In addition to looking at the many features Opera offers, the review included performance data from multiple benchmarks, which look at areas such as hardware graphics acceleration, WebGL performance, memory consumption, and battery life.

Three of the benchmarks have a significant, though not exclusive, focus on JavaScript performance: Google Octane 2.0, JetStream 1.1, and WebXPRT 2015. The three benchmarks did not rank the browsers the same way, and in the past, we‘ve discussed some of the reasons why this happens. In addition to the difference in tests, there are also sometimes differences in approaches that are worth considering.

For example, consider the test descriptions for JetStream 1.1. You’ll immediately notice that the tests are much lower-level tests than the ones in WebXPRT. However, consider these phrases from a few of the test descriptions:

  • code-first-load “…This test attempts to defeat the browser’s caching capabilities…”
  • splay-latency “Tests the worst-case performance…”
  • zlib “…modified to restrict code caching opportunities…”

 

While the XPRTs test typical performance for higher level applications, the tests in JetStream are tweaked to stress devices in very specific ways, some of which are not typical. The information these tests provide can be very useful for engineers and developers, but may not be as meaningful to the typical user.

I have to stress that both approaches are valid, but they are doing somewhat different things. There’s a cliché about comparing apples to apples, but not all apples are the same. If you’re making a pie, a Granny Smith would be a good choice, but for snacking, you might be better off with a Red Delicious. Knowing a benchmark’s purpose will help you find the results that are most meaningful to you.

Eric

Open source?

We’re proud of the BenchmarkXPRT Development Community and its accomplishments over the last five years. We’re also thankful for the contributions the members of the community have made. One of the benefits of membership is access to the source code for all the XPRT performance tools. This has meant that the code is available to anyone willing to take the easy step of joining the community.

Behind our decision to use this model rather than a more traditional, open-source model was the need to control derivative works. The license agreement for the source allows members to modify the source, but not to claim that the results from that derivative code are XPRT results. For example, as a member, you may download the TouchXPRT source and modify the workloads for your specific purposes, but you can’t refer to the results as TouchXPRT results.

After much thought and discussion, we have come to believe that we can protect the benchmarks’ reputation within a traditional, open-source framework. While our original concerns are still valid, we think that the success and stature of the XPRTs is such that we can make it available via open source.

However, before we take this step, we want to hear the thoughts, concerns, and opinions of both our community members and the wider public.

Please note that if we do make the code open source, the other benefits of being a member—access to requests for comment, design documents, and community previews—will not change.

Please let us know that you think. Email us or contact us on Twitter.

Bill

Seeing the future

Back in April we wrote about how Bill’s trip to IDF16 in Shenzhen got us thinking about future benchmarks. Technologies like virtual reality, the Internet of things, and computer vision are going to open up lots of new applications.

Yesterday I saw an amazing article that talked about an automatic computer vision system that is able to detect early-stage esophageal cancer from endoscopy images. These lesions can be difficult for physicians to detect, and the system did very well when compared to four experts who participated in the test. The article contains a link to the original study, for those of you who want more detail.

To me, this is the stuff of science fiction. It’s a very impressive accomplishment. Clearly, new technologies are going to lead to many new and exciting applications.

While this type of application is more specialized than the typical XPRT, things like this get us really excited about the possibilities for the future.  Have you seen an application that impressed you recently? Let us know!

Eric

Watching students become masters

As you know, last year, PT sponsored a senior project at the Senior Design Center of North Carolina State University (NCSU). The students created Nebula Wolf, a mini game that might evolve into a future benchmark test. It was a valuable collaboration for us and a very educational experience for the students involved.

I’ve talked before about the emerging technologies we’re considering for new benchmarks. Today, I met with the folks at the NCSU Senior Design Center to discuss a possible future project. We’re hoping to harness the immense energy of these students by having them explore one of these new technologies, and then build on what they discover. Nothing is set yet, but we will, as always, keep you informed as things develop.

We’ll be sharing some exciting news about the XPRT Women Code-a-Thon tomorrow. Check back to find out more!  Meanwhile, we hope you enjoy as much as we did the University of Washington Tacoma article on student Viveret, the first place winner of the XPRT Women Code-a-Thon.

Eric

Personal preference

I saw an interesting article recently, Here’s why I gave up my beloved Galaxy S7 for a boring old iPhone. It’s only been a few weeks since we featured the Samsung S7 in the XPRT Weekly Tech Spotlight, so of course I had to read it. The interesting thing is this guy really loved his Samsung S7, and even declared it “the best smartphone I’ve ever used.” He loved its VR capabilities, camera, and its look. He even prefers Android as an operating system.

So why would he give it up for an iPhone 6s Plus? Simply put, battery life. As a self-described heavy user, he found his Samsung S7 dying before 5 PM every day. The iPhone 6s Plus lasted much longer.

This is a good reminder that people have different priorities. Your priority could be having the fastest phone, the longest battery life, the best screen, or the broadest compatibility. This is why there is no such this as “the best device.”

This is why we are always asking for your input. Knowing your priorities helps the community build better tests!

Eric

Feedback

We’re excited by the high level of interest the community and vendors have shown in the upcoming cross-platform MobileXPRT benchmark. We’ve received general observations about what a cross-platform benchmark should be, along with detailed suggestions about tests, subsystems, and benchmark architecture. We appreciate all of the responses and welcome more, so please keep them coming!

The number-one concern we’ve heard is that we be sure the benchmark tests all platforms fairly. Transparency will be essential to assure users that the tests are performing the same work on all platforms and performing the work in the appropriate way for each platform.

Fortunately, the XPRTs are well positioned to address that concern. From the beginning, we have used a community model. The source code is available to all members, which is the ultimate in transparency.  (If you’re not a community member, it’s easy to join!)

Speaking of source code, we released TouchXPRT source code to the community this week. Members can download the source here (login required).

Eric

Check out the other XPRTs:

Forgot your password?