Simplify HPC with HPE and Intel® Omni-Path Architecture.
The Intel® Omni-Path Architecture simplifies network infrastructure. Achieving efficient performance at scale remains the core challenge in HPC today. Increasingly, interconnect fabric represents a cluster’s choke point. So organisations spend as much as 40 percent of their hardware budget on HPC fabric. Their system’s infrastructure, in other words, is taking over their system.
There is a better way. Hewlett Packard Enterprise together with Intel have integrated Intel® Omni-Path fabric into HPE’s latest generation of HPC servers. HPE clusters, integrated with high-speed, low-latency Omni-Path fabric, are becoming the system to beat for many applications.
Intel’s Omni-Path Architecture (OPA) delivers 100 Gbps port bandwidth with low latency even at extreme scale. Its 48-port radix switch silicon can reduce the number of switches by as much as 50 percent in a typical fat tree configuration. This lighter interconnect footprint translates to less rack space and lower power and cooling costs. It also means a reduced fabric cost: a typical Omni-Path solution represents just 21 percent of an HPC hardware budget.
Other key OPA features and innovations as integrated into HPE clusters and servers include:
- Adaptive Routing scales as fabrics grow larger and more complex. Intel’s Fabric Manager and the switch ASICs keep open multiple egress options per destination and dynamically update the fabric continuously as links are added or removed.
- Dispersive Routing promotes optimal fabric efficiency. Omni-Path fabric continuously defines alternate routes that dissipate traffic tie-ups and increase redundancy, performance, and load balancing.
- Traffic Flow Optimisation shuttles high-priority traffic through in the presence of lower-priority traffic. Other leading fabrics today cannot accommodate higher-priority packets that become available mid-transmission. By contrast, Omni-Path breaks its packets into smaller containers, enabling higher-priority packets to move to the head of the line, even if a lower-priority packet has already begun transmitting.
- Packet Integrity Protection eliminates the need for transport level timeouts and end-to-end retries. Because of Intel’s high OPA signaling rate (27.78G per lane) and its ability to support one-hundred thousand or more links, transient bit errors must be tolerated without adversely affecting performance. Packet Integrity Protection enables recovery of transient errors whether it is between a host and switch or between switches.
- Dynamic Lane Scaling allows an operation to continue even if one or more lanes of a 4x link fail, saving the need to restart or go to a previous checkpoint to keep the application running. The job can then run to completion before taking action to resolve the issue.