Written by Actu IA
The computational needs of AI researchers continue to increase as the complexity of Deep Learning (DL) networks and training data grows exponentially. Previously, training was limited to a few GPUs, often in workstations. Today, training routinely uses tens, hundreds, or even thousands of GPUs to evaluate and optimize different model configurations and parameters. In addition, organizations have multiple AI researchers who all need to train many models simultaneously. These facilities are the hallmark of the world’s leading research labs and universities, fueling the innovation that propels scientific efforts of all kinds.
Designing and implementing a large-scale computing infrastructure for AI requires understanding the computational goals of these researchers in order to build fast, efficient, and cost-effective systems. To build a flexible system that can run a multitude of Deep Learning applications in a scalable manner, organizations need a well-rounded system that includes, at a minimum:
- Scalable and powerful nodes with multiple GPUs, large memory, and fast connections between GPUs for computation to support the variety of DL models in use.
- A high-bandwidth, low-latency HDR InfiniBand (IB) interconnect designed with the capacity and topology to minimize bottlenecks.
- A storage server capable of providing maximum performance for the various data structures.
Its design introduces compute blocks called Scalable Units (SUs) that enable the modular deployment of a full 140-node DGX SuperPOD, which can then scale to hundreds of nodes.
This architecture has been leveraged in the NVIDIA DGX SATURNV infrastructure that powers NVIDIA’s research and development in autonomous vehicles, natural language processing, robotics, graphics, HPC and other areas. Organizations looking to deploy their own supercomputing infrastructure can leverage the NVIDIA DGX SuperPOD solution for enterprises, deployed in a turnkey infrastructure solution, along with a full lifecycle of advanced services from planning and design to deployment and ongoing optimization.
NVIDIA Networking Solutions for DGX SuperPOD
NVIDIA Networking solutions, formerly Mellanox, the leader in InfiniBand networking infrastructure and acquired by NVIDIA in 2020, are an integral part of the DGX SuperPOD infrastructure. Sonia Cheriet, Sales Manager South Europe at NVIDIA Networking, tells us more.
What role does the NVIDIA Networking infrastructure play in the SuperPODS architecture?
Sonia Cheriet Sr. Channel Sales Manager – Southern Europe, Mellanox Networking Solutions:
”"With Nvidia Networking technology, we are redefining the data center with an architecture that can parallelize the most complex problems and solve them as quickly as possible. The DGX A100 comes with new Mellanox ConnectX-6 VPI network adapters with 200 Gbit/s HDR Infiniband - up to nine interfaces per system. We are taking advantage of Mellanox switching to make it easier to interconnect systems and achieve SuperPOD scale."
What are the advantages of these solutions compared to other networking players?
Sonia Cheriet Sr. Channel Sales Manager – Southern Europe, Mellanox Networking Solutions :
”"Our end-to-end, GPU-accelerated, InfiniBand and Ethernet-enabled networking solutions enable enterprises to implement a network infrastructure that can support complete implementations from development to deployment with all modern workloads and diverse storage requirements, paving the way for the new era of accelerated computing to maximize your return on investment in AI."
Can NVIDIA Networking solutions be found in other applications?
NVIDIA Networking solutions address the exponentially increasing demands for greater computing power efficiency, manageability and scalability required for the HPC, Web 2.0, ML, data analytics and storage markets. We are the only vendor to offer complete end-to-end solutions that support InfiniBand and Ethernet networking technologies.
Contributed by Martin Jezequel, Product Manager Data Center Solutions, PNY Technologies & Sonia Cheriet Sr. Channel Sales Manager – Southern Europe, Mellanox Networking Solutions.