How Omni-Path simplifies compute architecture

Supercomputing in the 1980s and early ‘90s used to be about streamlining very few, very fast processors. However, once the supercomputing paradigm shifted to running clusters of thousands of commodity processors, the interconnects between those processors became vastly more significant. Today, the Intel® Omni-Path interconnect dramatically improves compute architecture.

Nowadays, the world’s fastest supercomputers run up to 10 million cores. Even machines ranked in the 100th-300th place in the Top500 list, operate with tens of thousands of cores. This represents a mind-boggling amount of CPU and GPU traffic, and it deserves a fabric solution expressly made to interconnect systems ranging from just a few cores to millions of cores.

Intel®’s Omni-Path Architecture (Intel OPA) historically emerged from the proven Aries fabric and Intel® True Scale Fabric (InfiniBand). Crucially, OPA has always been a high-performance computer interconnect fabric. Its design takes the best of proven technologies, adding revolutionary features not found in other interconnect to meet the needs of HPC clusters as they continue to grow towards exascale. OPA’s HPC efficiency derives in no small part from being developed and designed specifically to serve as a cutting-edge HPC interconnect, not beginning life as a general high-speed interconnect.

OPA is thus a next-generation fabric — including host interface, switches, and cabling — that is specifically designed for HPC workloads. Intel® OPA is designed to deliver high message injection rates and low latency, while providing full bi-sectional bandwidth across very large clusters.

For instance, today’s family of 48-port edge switches, and the 192-port and 768-port director class switches— all integrate more holistically into clusters of all sizes, especially clusters built with HPE Apollo or HPE SGI 8600 servers which benefit from custom OPA switches and adapters. OPA’s 48-port switch silicon provides density, power and performance efficiencies which translates into a superior high-performance, low-latency interconnect

As noted elsewhere, OPA’s higher radix chip simpler topology translates to greater performance, lower-latency and cost – especially as cluster size grows. For example, one could build a 768 node configuration with a three-hop fat tree topology utilizing only one OPA director class switch. In contrast, this same configuration would require 43 x 36 port edge switches and 2 EDR director class switches driving the need for a five-hop fat tree configuration. In addition to the 50% latency overhead of this EDR configuration, OPA would utilize far less switches, utilize 50% fewer cables, and 79% less rack space leading to significant cost savings.

However, the transformative economics of OPA only begin with the simpler topology. OPA is also part of a broader Intel® Scalable System Framework, which is quietly transforming HPC via tightly integrated components that support a wide range of workloads on a common infrastructure.

For instance Intel® Xeon® processors continue to power most of the world’s fastest supercomputers, with 471 systems based on Intel technology listed in the November 2017 Top500 list. Moreover, Intel® Node Manager, Intel® Parallel Studio and Intel® Trace Analyzer round out the Intel Software Development Tools for HPC that help boost application performance on highly parallel processors.

At the granular scale, OPA also delivers a carefully crafted architecture built for HPC clusters, pushing the frontiers of scale. OPA’s link layer, for instance, delivers low and deterministic latency, penalty-free packet integrity, and resilient operation in the event of a lane failure. OPA also intelligently determines the best method of data transport available to it— setting up an RDMA channel using the host adapter or using the fast and efficient resources of the Intel Xeon processor to transfer data. To further reduce latency and increase performance, OPA’s traffic optimization provides the ability for more granular prioritization in clusters with data contention. This will significantly reduce run to run inconsistencies with faster time to completion.

In all, Intel OPA condenses HPC interconnect topology with a robust, high-performance architecture. Its end-to-end 100 Gbps speeds deliver high bandwidth and low latency in a solution whose performance improves with scale and is streamlined, integrating holistically with Intel’s Scalable System Architecture.

To discover how OPA can simplify, streamline and accelerate the performance of your HPC, explore HPE and Intel® OPA HPC solutions today.

 

Contact us

 

Share on FacebookShare on Google+Tweet about this on TwitterShare on LinkedIn

Leave a Reply