AIXPRT Tool

AIXPRT is a free AI benchmark tool that measures how well a system performs machine learning inference tasks on several common workloads. The AIXPRT bundled toolkits measure inference latency (the speed of processing) and throughput (the number of inputs a system processes in a given time period) for common image classification (ResNet 50), object detection (SSD MobileNet v1), and recommender system (Wide and Deep) tasks.

Image classification

Image classification, or recognition, is the process by which a trained convolutional neural network model analyzes an image and predicts (infers) what the image represents. For example, during training, the model may analyze thousands of images depicting three types of big cats: cheetahs, lions, and tigers. The model learns to associate specific properties of those images with each of those three labels. Later, when a workload task presents the model with an inference request by asking for classification of a picture of a big cat, the model outputs a probability between the values of 0 and 1 that the image represents either a cheetah, a lion, or a tiger. The closer the value is to 1, the more confident the model is that its classification is correct.

One of the most common applications of image classification is facial recognition in personal photos.

AIXPRT uses the ResNet-50 network model for image classification tasks.

Object detection

Object detection is the process by which a trained convolutional neural network model analyzes a still or video image and identifies the presence and location of multiple objects. As with image-classification training, object detection training uses thousands of images to teach the model to recognize specific labels, or classes. Later, when a workload task presents the model with an inference request by asking for object detection, the model identifies the presence of one or more objects in the image, records the location of those objects, and outputs a probability between the values of 0 and 1 that each of the objects represents a learned class.

Low-latency, highly accurate object detection is especially critical for emerging technologies such as autonomous vehicles.

AIXPRT uses the SSD-MobileNet v1 network model for image classification tasks.

Recommender system

Recommender systems, or engines, are data filtering and analysis systems that suggest products or services based on what they learn from user preferences and behavior. AIXPRT uses a Wide and Deep network model for recommender system tasks. Wide and Deep networks combine the memorization efficiency of linear learning (wide) models with the generalized inference powers of neural network (deep) models.

Recommender systems are ubiquitous in modern life, and serve as the brains behind the suggestion and auto-complete features we see in streaming content services, social media, navigation, and targeted advertising.

Skills you’ll need

The skills needed to install and run AIXPRT successfully vary depending on the installation package. All AIXPRT test packages require basic terminal skills. The TensorRT packages require familiarity with setting up environment variables, working within Visual Studio (in Windows), and building software from source.

How long it takes

The length of a test run depends on the system under test and the test configuration settings. Test length can range from just a few minutes to several hours.

Choosing an AIXPRT download package

AI workloads are now relevant to all types of hardware, from servers to laptops to IOT devices, so we intentionally designed AIXPRT to support a wide range of potential hardware, toolkit, and workload configurations. This approach provides AIXPRT testers with a tool that is flexible enough to adapt to a variety of environments. The downside is that the number of options makes it complicated to determine which AIXPRT download package suits your needs.

To help testers navigate this complexity, we’ve developed an interactive package selector tool. Testers select options in five categories: operating system, host hardware, toolkit, target hardware, and workload. They can proceed in any order but must make a selection for every category. Because not all combinations work together, each selection the tester makes eliminates some options in the remaining categories.

After a tester selects an option, a check mark appears on the category icon, and the selection they have made appears in the category box (e.g., TensorFlow in the Toolkit category). This shows testers which categories they’ve completed and the selections they’ve made. After a tester completes more than one category, a Start over button appears in the lower-left corner. Clicking this button clears all selections and gives testers a clean slate.

Once you’ve completed all five categories, a Download button appears in the lower-right corner. When you click this, a popup appears that provides a link for the correct download package and associated readme file.

Testers who know exactly which package they need can bypass the tool and go directly to the download table.

Adjusting test variables

AIXPRT testers can adjust the following test configuration variables:

Precision (FP32, FP16, or INT8)
Batch size (1, 2, 4, 8, 16, 32, etc.)
Number of concurrent instances (1 or more depending on hardware support)

Using alternative test configuration files

AIXPRT testers can adjust batch size, level of precision, and number of concurrent instances by editing the JSON file in the AIXPRT/Config directory. While the process is straightforward, editing each of the variables in a config file can take some time, and testers don’t always know the appropriate values for their system. To address both issues, we are offering a selection of alternative config files that testers can download and drop into the AIXPRT/Config directory. To access the alternative config files, visit the AIXPRT public resources repository.

Targeting hardware components

Depending on the operating system, toolkit, and workload, AIXPRT can target x86 CPUs, AMD discrete GPUs, Intel processor graphics, Intel Neural Compute Sticks, or NVIDIA GPUs.

Latency

In the context of machine learning, latency refers to the amount of time it takes to process one inference request.

What is a good latency result?

In real-time or near real-time use cases such as performing image recognition on individual photos being captured by a camera, lower latency is important because it improves the user experience. In other cases, such as performing image recognition on a large library of photos, achieving higher throughput might be preferable; designating larger batch sizes or running concurrent instances might allow the overall workload to complete more quickly.

The dynamics of these performance tradeoffs mean that there is no single “good” score for all machine learning scenarios. Some testers might prefer lower latency, while others would sacrifice latency to achieve the higher level of throughput that their use case demands.

When latency is your top priority

Suppose you’re a data science firm helping a client optimize manufacturing yield by speeding the inspection process. Cameras photograph each product and upload pictures to a server with software that queries a machine learning model. Because your goal is to identify defective items as quickly as possible, your priority is latency, and you could adjust the AIXPRT variables to use small batch sizes and a single instance.

Throughput

In the context of machine learning, throughput refers to the number of inputs a system processes in a given time period.

What is a good throughput result?

In real-time or near real-time use cases such as performing image recognition on individual photos being captured by a camera, lower latency is important because it improves the user experience. In other cases, such as performing image recognition on a large library of photos, achieving higher throughput might be preferable; designating larger batch sizes or running concurrent instances might allow the overall workload to complete more quickly.

The dynamics of these performance tradeoffs mean that there is no single “good” score for all machine learning scenarios. Some testers might prefer lower latency, while others would sacrifice latency to achieve the higher level of throughput that their use case demands.

When throughput is your top priority

Suppose you’re an archivist with hundreds of thousands of historical photographs you must categorize. Because your goal is to process this large volume of images efficiently, you care most about throughput, and you could adjust the AIXPRT variables to use large batch sizes and run concurrent instances.

Why does this matter for servers and cloud instances?

Currently, servers and cloud instances handle the bulk of machine learning inference work for many common apps. For app publishers, the process of choosing an ideal back-end setup can be difficult. Depending on the purpose of an app, publishers may want to prioritize high latency, high throughput, or a finely tuned balance between the two. Regardless of the priority, AIXPRT provides users with accurate and reliable data about these critical inference measures. That data can help businesses make successful infrastructure decisions and avoid costly periods of trial and error or customer dissatisfaction.

Why does this matter for PCs?

As desktop and laptop computing power continues to grow, an increasing number of app publishers are moving inference workloads from the back end to the client side. With inference work on the client side, apps can provide users with features such as voice and facial recognition, image classification, object detection, recommendations, and pattern recognition, even while offline or in areas that have slow or spotty network connections. Users that rely on these powerful apps need to know if their gear can handle the load, and AIXPRT can help identify which PCs are up to the task.

Image classification

Image classification, or recognition, is the process by which a trained convolutional neural network model analyzes an image and predicts (infers) what the image represents. For example, during training, the model may analyze thousands of images depicting three types of big cats: cheetahs, lions, and tigers. The model learns to associate specific properties of those images with each of those three labels. Later, when a workload task presents the model with an inference request by asking for classification of a picture of a big cat, the model outputs a probability between the values of 0 and 1 that the image represents either a cheetah, a lion, or a tiger. The closer the value is to 1, the more confident the model is that its classification is correct.

One of the most common applications of image classification is facial recognition in personal photos.

AIXPRT uses the ResNet-50 network model for image classification tasks.

Object detection

Object detection is the process by which a trained convolutional neural network model analyzes a still or video image and identifies the presence and location of multiple objects. As with image-classification training, object detection training uses thousands of images to teach the model to recognize specific labels, or classes. Later, when a workload task presents the model with an inference request by asking for object detection, the model identifies the presence of one or more objects in the image, records the location of those objects, and outputs a probability between the values of 0 and 1 that each of the objects represents a learned class.

Low-latency, highly accurate object detection is especially critical for emerging technologies such as autonomous vehicles.

AIXPRT uses the SSD-MobileNet v1 network model for image classification tasks.

Recommender system

Recommender systems, or engines, are data filtering and analysis systems that suggest products or services based on what they learn from user preferences and behavior. AIXPRT uses a Wide and Deep network model for recommender system tasks. Wide and Deep networks combine the memorization efficiency of linear learning (wide) models with the generalized inference powers of neural network (deep) models.

Recommender systems are ubiquitous in modern life, and serve as the brains behind the suggestion and auto-complete features we see in streaming content services, social media, navigation, and targeted advertising.

What does AIXPRT do?

Do I have to pay for this?

Where can I download AIXPRT?

Are there restrictions on publishing results?

Can I use data from the AIXPRT results database in my article?

What skills do I need to run AIXPRT?

What types of system can I run it on?

How long does it take to run?

Which operating systems does AIXPRT support?

What are the minimum hardware requirements?

Which hardware components can I target?

Which toolkits/workloads does AIXPRT run?

The toolkits

The networks

The workloads

Can I test my system?

Running AIXPRT

Understanding AIXPRT results

The AIXPRT results database

Sharing your results

Key terms and concepts

Inference tasks (workloads)

What does AIXPRT do?

Do I have to pay for this?

Where can I download AIXPRT?

Are there restrictions on publishing results?

Can I use data from the AIXPRT results database in my article?

What skills do I need to run AIXPRT?

What types of systems can I run it on?

How long does it take to run?

Which operating systems does AIXPRT support?

What are the minimum hardware requirements?

Which hardware components can I target?

Which toolkits/workloads does AIXPRT run?

What does AIXPRT do?

The toolkits

OpenVINO

TensorFlow

TensorRT

MXNet

The networks

ResNet-50

SSD-MobileNet

Wide and Deep

The workloads

Image classification

Object detection

Recommender system

Can I test my system?

On what types of systems can I run AIXPRT?

Supported operating systems

Minimum hardware requirements

Running AIXPRT

Skills you’ll need

How long it takes

Choosing an AIXPRT download package

Adjusting test variables

Using alternative test configuration files

Targeting hardware components

Understanding AIXPRT results

Latency

What is a good latency result?

When latency is your top priority

Throughput

What is a good throughput result?

When throughput is your top priority

Why does this matter for servers and cloud instances?

Why does this matter for PCs?

The AIXPRT results database

Sharing your results

Publishing results yourself

Submitting results for inclusion in the AIXPRT results database

Key terms and concepts

Neural networks

Training

Interference

Latency

Throughput

Inference tasks (workloads)

Image classification

Object detection

Recommender system

The demands that ML workloads put on hardware

What does AIXPRT do?

Do I have to pay for this?

Where can I download AIXPRT?

Are there restrictions on publishing results?

Can I use data from the AIXPRT results database in my article?

What skills do I need to run AIXPRT?

What types of system can I run it on?

How long does it take to run?

Which operating systems does AIXPRT support?

What are the minimum hardware requirements?

Which hardware components can I target?

Which toolkits/workloads does AIXPRT run?

The toolkits

The networks

The workloads

Can I test my system?

Running AIXPRT

Understanding AIXPRT results

The AIXPRT results database