Category: Benchmark metrics

More details about CloudXPRT’s workloads

on April 2, 2020

About a month ago, we posted an update on the CloudXPRT development process. Today, we want to provide more details about the three workloads we plan to offer in the initial preview build:

In the web-tier microservices workload, a simulated user logs in to a web application that does three things: provides a selection of stock options, performs Monte-Carlo simulations with those stocks, and presents the user with options that may be of interest. The workload reports performance in transactions per second, which testers can use to directly compare IaaS stacks and to evaluate whether any given stack is capable of meeting service-level agreement (SLA) thresholds.
The machine learning (ML) training workload calculates XGBoost model training time. XGBoost is a gradient-boosting framework that data scientists often use for ML-based regression and classification problems. The purpose of the workload in the context of CloudXPRT is to evaluate how well an IaaS stack enables XGBoost to speed and optimize model training. The workload reports latency and throughput rates. As with the web-tier microservices workload, testers can use this workload’s metrics to compare IaaS stack performance and to evaluate whether any given stack is capable of meeting SLA thresholds.
The AI-themed container scaling workload starts up a container and uses a version of the AIXPRT harness to launch Wide and Deep recommender system inference tasks in the container. Each container represents a fixed amount of work, and as the number of Wide and Deep jobs increases, CloudXPRT launches more containers in parallel to handle the load. The workload reports both the startup time for the containers and the Wide and Deep throughput results. Testers can use this workload to compare container startup time between IaaS stacks; optimize the balance between resource allocation, capacity, and throughput on a given stack; and confirm whether a given stack is suitable for specific SLAs.

We’re continuing to move forward with CloudXPRT development and testing and hope to add more workloads in subsequent builds. Like most organizations, we’ve adjusted our work patterns to adapt to the COVID-19 situation. While this has slowed our progress a bit, we still hope to release the CloudXPRT preview build in April. If anything changes, we’ll let folks know as soon as possible here in the blog.

If you have any thoughts or comments about CloudXPRT workloads, please feel free to contact us.

Justin

Posted in AI, AIXPRT, benchmark, Benchmark metrics, BenchmarkXPRT, Cloud, CloudXPRT, container scaling, Datacenter, Hybrid cloud, Machine learning, microservices, On-premise, recommender system, Servers, Wide and Deep | Tagged AIXPRT, cloud, CloudXPRT, container scaling, IaaS, machine learning, SLA |

Thinking ahead to WebXPRT 4

By Justin Greene

on March 5, 2020

It’s been about two years since we released WebXPRT 3, and we’re starting to think about the WebXPRT 4 development cycle. With over 529,000 runs to date, WebXPRT continues to be our most popular benchmark because it’s quick and easy to run, it runs on almost anything with a web browser, and it evaluates performance using the types of web technologies that many people use every day.

For each new version of WebXPRT, we start the development process by looking at browser trends and analyzing the feasibility of incorporating new web technologies into our workload scenarios. For example, in WebXPRT 3, we updated the Organize Album workload to include an image-classification task that uses deep learning. We also added an optical character recognition task to the Encrypt Notes and OCR scan workload, and introduced a new Online Homework workload that combined part of the DNA Sequence Analysis scenario with a writing sample/spell check scenario.

Here are the current WebXPRT 3 workloads:

Photo Enhancement: Applies three effects, each using Canvas, to two photos.
Organize Album Using AI: Detects faces and classifies images using the ConvNetJS neural network library.
Stock Option Pricing: Calculates and displays graphic views of a stock portfolio using Canvas, SVG, and dygraphs.js.
Encrypt Notes and OCR Scan: Encrypts notes in local storage and scans a receipt using optical character recognition.
Sales Graphs: Calculates and displays multiple views of sales data using InfoVis and d3.js.
Online Homework: Performs science and English assignment tasks using Web Workers and Typo.js spell check.

What new technologies or workload scenarios should we add? Are there any existing features we should remove? Would you be interested in an associated battery life test? We want to hear your thoughts and ideas about WebXPRT, so please tell us what you think!

Justin

Posted in AI, benchmark, Benchmark metrics, Benchmarking, BenchmarkXPRT, Browser-based benchmarks, Collaborative benchmark development, Cross-platform benchmarks, image classification, JavaScript, Performance benchmarking, WebXPRT, WebXPRT 3, What makes a good benchmark? | Tagged benchmark, browsers, WebXPRT, WebXPRT 3 |

CloudXPRT development news

By Justin Greene

on February 27, 2020

Last month, Bill announced that we were starting work on a new data center benchmark. CloudXPRT will measure the performance of modern, cloud-first applications deployed on infrastructure as a service (IaaS) platforms—on-premises platforms, externally hosted platforms, and hybrid clouds that use a mix of the two. Our ultimate goal is for CloudXPRT to use cloud-native components on an actual stack to produce end-to-end performance metrics that can help users determine the right IaaS configuration for their business.

Today, we want to provide a quick update on CloudXPRT development and testing.

Installation. We’ve completely automated the CloudXPRT installation process, which leverages Kubernetes or Ansible tools depending on the target platform. The installation processes differ slightly for each platform, but testing is the same.
Workloads. We’re currently testing potential workloads that focus on three areas: web microservices, data analytics, and container scaling. We might not include all of these workloads in the first release, but we’ll keep the community informed and share more details about each workload as the picture becomes clearer. We are designing the workloads so that testers can use them to directly compare IaaS stacks and evaluate whether any given stack can meet service level agreement (SLA) thresholds.
Platforms. We want CloudXPRT to eventually support testing on a variety of popular externally hosted platforms. However, constructing a cross-platform benchmark is complicated and we haven’t yet decided which external platforms the first CloudXPRT release will support. We’ve successfully tested the current build with on-premises IaaS stacks and with one externally hosted platform, Amazon Web Services. Next, we will test the build on Google Cloud Hosting and Microsoft Azure.
Timeline. We are on track to meet our target of releasing a CloudXPRT preview build in late March and the first official build about two months later. If anything changes, we’ll post an updated timeline here in the blog.

If you would like to share any thoughts or comments related to CloudXPRT or cloud benchmarking, please feel free to contact us.

Justin

Posted in benchmark, Benchmark metrics, Benchmarking, Cloud, CloudXPRT, Collaborative benchmark development, container scaling, Datacenter, Future of performance evaluation, Hosted cloud, Hybrid cloud, IaaS, On-premise, Servers, web microservices | Tagged benchmark, cloud, CloudXPRT, datacenter, hosted cloud, hybrid cloud, IaaS, on-premise |

The XPRT activity we have planned for first half of 2020

By Justin Greene

on January 23, 2020

Today, we want to let readers know what to expect from the XPRTs over the next several months. Timelines and details can always change, but we’re confident that community members will see CloudXPRT Community Preview (CP), updated AIXPRT, and CrXPRT 2 releases during the first half of 2020.

CloudXPRT

Last week, Bill shared some details about our new datacenter-oriented benchmark, CloudXPRT. If you missed that post, we encourage you to check it out and learn more about the need for a new kind of cloud benchmark, and our plans for the benchmark’s structure and metrics. We’re already testing preliminary builds, and aim to release a CloudXPRT CP in late March, followed by a version for general availability roughly two months later.

AIXPRT

About a month ago, we explained how the number of moving parts in AIXPRT will necessitate a different development approach than we’ve used for other XPRTs. AIXPRT will require more frequent updating than our other benchmarks, and we anticipate releasing the second version of AIXPRT by mid-year. We’re still finalizing the details, but it’s likely to include the latest versions of ResNet-50 and SSD-MobileNet, selected SDK updates, ease-of-use improvements for the harness, and improved installation scripts. We’ll share more detailed information about the release timeline here in the blog as soon as possible.

CrXPRT 2

As we mentioned in December, we’re working on CrXPRT 2, the next version of our benchmark that evaluates the performance and battery life of Chromebooks. You can find out more about how CrXPRT works both here in the blog and at CrXPRT.com.

We’re currently testing an alpha version of CrXPRT 2. Testing is going well, but we’re tweaking a few items and refining the new UI. We should start testing a CP candidate in the next few weeks, and will have firmer information for community members about a CP release date very soon.

We’re excited about these new developments and the prospect of extending the XPRTs into new areas. If you have any questions about CloudXPRT, AIXPRT, or CrXPRT 2, please feel free to ask!

Justin

Posted in AI, AIXPRT, benchmark, Benchmark metrics, BenchmarkXPRT, BenchmarkXPRT development community, Chromebooks, Cloud, CloudXPRT, Collaborative benchmark development, Community Preview, CrXPRT, Datacenter, ResNet-50, SSD-MobileNet v1 | Tagged AIXPRT, Chrome OS, Chromebooks, ChromeXPRT, CloudXPRT, CrXPRT, datacenter, ResNet-50, SSD-MobileNet v1 |

How to use alternate configuration files with AIXPRT

By Justin Greene

on October 10, 2019

In last week’s AIXPRT Community Preview 3 announcement, we mentioned the new public GitHub repository that we’re using to publish AIXPRT-related information and resources. In addition to the installation readmes for each AIXPRT installation package, the repository contains a selection of alternative test config files that testers can use to quickly and easily change a test’s parameters.

As we discussed in previous blog entries about batch size, levels of precision, and number of concurrent instances, AIXPRT testers can adjust each of these key variables by editing the JSON file in the AIXPRT/Config directory. While the process is straightforward, editing each of the variables in a config file can take some time, and testers don’t always know the appropriate values for their system. To address both of these issues, we are offering a selection of alternative config files that testers can download and drop into the AIXPRT/Config directory.

In the GitHub repository, we’ve organized the available config files first by operating system (Linux_Ubuntu and Windows) and then by vendor (All, Intel, and NVIDIA). Within each section, testers will find preconfigured JSON files set up for several scenarios, such as running with multiple concurrent instances on a system’s CPU or GPU, running with FP32 precision instead of FP16, etc. The picture below shows the preconfigured files that are currently available for systems running Ubuntu on Intel hardware.

Because potential AIXPRT use cases cut across a wide range of hardware segments, including desktops, edge devices, and servers, not all AIXPRT workloads and configs will be applicable to each segment. As we move towards the AIXPRT GA, we’re working to find the best way to parse out these distinctions and communicate them to end users. In many cases, the ideal combination of test configuration variables remains an open question for ongoing research. However, we hope the alternative configuration files will help by giving testers a starting place.

If you experiment with an alternative test configuration file, please note that it should replace the existing default config file. If more than one config file is present, AIXPRT will run all the configurations and generate a separate result for each. More information about the config files and detailed instructions for how to handle the files are available in the EditConfig.md document in the public repository.

We’ll continue to keep everyone up to date with AIXPRT news here in the blog. If you have any questions or comments, please let us know.

Justin

Posted in AI, AIXPRT, benchmark, Benchmark metrics, Collaborative benchmark development, Community Preview, GitHub, Intel, Linux, Machine learning, NVIDIA, Performance benchmarking, Ubuntu | Tagged AI, AIXPRT, batch size, concurrent instances, FP16, FP32, GitHub, Intel, NVIDIA, precision, Windows |

Understanding the basics of AIXPRT precision settings

By Justin Greene

on September 5, 2019

A few weeks ago, we discussed one of AIXPRT’s key configuration variables, batch size. Today, we’re discussing another key variable: the level of precision. In the context of machine learning (ML) inference, the level of precision refers to the computer number format (FP32, FP16, or INT8) representing the weights (parameters) a network model uses when performing the calculations necessary for inference tasks.

Higher levels of precision for inference tasks help decrease the number of false positives and false negatives, but they can increase the amount of time, memory bandwidth, and computational power necessary to achieve accurate results. Lower levels of precision typically (but not always) enable the model to process inputs more quickly while using less memory and processing power, but they can allow a degree of inaccuracy that is unacceptable for certain real-world applications.

For example, a high level of precision may be appropriate for computer vision applications in the medical field, where the benefits of hyper-accurate object detection and classification far outweigh the benefit of saving a few milliseconds. On the other hand, a low level of precision may work well for vision-based sensors in the security industry, where alert time is critical and monitors simply need to know if an animal or a human triggered a motion-activated camera.

FP32, FP16, and INT8

In AIXPRT, we can instruct the network models to use FP32, FP16, or INT8 levels of precision:

FP32 refers to single-precision (32-bit) floating point format, a number format that can represent an enormous range of values with a high degree of mathematical precision. Most CPUs and GPUs handle 32-bit floating point operations very efficiently, and many programs that use neural networks, including AIXPRT, use FP32 precision by default.
FP16 refers to half-precision (16-bit) floating point format, a number format that uses half the number of bits as FP32 to represent a model’s parameters. FP16 is a lower level of precision than FP32, but it still provides a great enough numerical range to successfully perform many inference tasks. FP16 often requires less time than FP32, and uses less memory.
INT8 refers to the 8-bit integer data type. INT8 data is better suited for certain types of calculations than floating point data, but it has a relatively small numeric range compared to FP16 or FP32. Depending on the model, INT8 precision can significantly improve latency and throughput, but there may be a loss of accuracy. INT8 precision does not always trade accuracy for speed, however. Researchers have shown that a process called quantization (i.e., approximating continuous values with discrete counterparts) can enable some networks, such as ResNet-50, to run INT8 precision without any significant loss of accuracy.

Configuring precision in AIXPRT

The screenshot below shows part of a sample config file, the same sample file we used for our batch size discussion. The value in the “precision” row indicates the precision setting. This test configuration would run tests using INT8. To change the precision, a tester simply replaces that value with “fp32” or “fp16” and saves the changes.

Note that while decreasing the precision from FP32 to FP16 or INT8 often results in larger throughput numbers and faster inference speeds overall, this is not always the case. Many other factors can affect ML performance, including (but not limited to) the complexity of the model, the presence of specific ML optimizations for the hardware under test, and any inherent limitations of the target CPU or GPU.

As with most AI-related topics, the details of model precision are extremely complex, and it’s a hot topic in cutting edge AI research. You don’t have to be an expert, however, to understand how changing the level of precision can affect AIXPRT test results. We hope that today’s discussion helped to make the basics of precision a little clearer. If you have any questions or comments, please feel free to contact us.

Justin

Posted in AI, AIXPRT, Benchmark metrics, Benchmarking, computer vision, image processing, Machine learning, object detection, ResNet-50 | Tagged AIXPRT, benchmark, computer vision, FP16, FP32, image processing, INT8, machine learning, object detection, precision |

Category: Benchmark metrics

More details about CloudXPRT’s workloads

Thinking ahead to WebXPRT 4

CloudXPRT development news

The XPRT activity we have planned for first half of 2020

CloudXPRT

AIXPRT

CrXPRT 2

How to use alternate configuration files with AIXPRT

Understanding the basics of AIXPRT precision settings

FP32, FP16, and INT8

Configuring precision in AIXPRT

Check out the other XPRTs: