Fluid and immersion is the new cool at Supercomputing ’22 • The Register
SC22 It’s safe to say that liquid cooling was a hot topic at this week’s supercomputing conference in Dallas.
As far as the eye could see, the exhibit hall was packed with liquid-cooled servers, oil-filled immersion cooling tanks, and every fixture, pump, and coolant distribution unit (CDU) you could possibly need to deploy the technology in a data center.
Given that this conference is about high performance computing, the emphasis on thermal management shouldn’t really come as a surprise. But with 400W CPUs and 700W GPUs now in the wild, this is hardly an HPC or AI exclusive issue. As more companies look to add AI/ML-enabled systems to their data centers, 3kW, 5kW, or even 10kW systems aren’t that crazy anymore.
So here’s a breakdown of the liquid cooling kit that caught our eye at this year’s show.
Direct liquid cooling
The vast majority of the liquid cooling systems being shown at SC22 are direct liquid systems. These replace heat sinks and fans made of copper or aluminum with cooling plates, rubber hoses and fittings.
If we’re being honest, these cold plates all look more or less the same. They are essentially just a hollowed out block of metal with an inlet and outlet for liquid to pass through. Note that we’re using the word “liquid” here because liquid-cooled systems can use any number of coolants that aren’t necessarily water.
A liquid-cooled server from Supermicro equipped with CoolIT Cold Plates. – Click to enlarge
In many cases, OEMs source their cold plates from the same vendors. For example, CoolIT provides liquid cooling hardware for several OEMs, including HPE and Supermicro.
But that doesn’t mean there isn’t room for differentiation. The inside of these cold plates is filled with micro-fins that can be adjusted to optimize the flow of liquid through them. Depending on how big or how many chips need to be cooled, the interior of these cooling plates can vary quite a bit.
Most of the liquid-cooled systems we saw on the show floor used some sort of rubber hose to connect the cold plates. This means liquid only cools specific components like the CPU and GPU. So while most of the fans can be removed, some airflow is still required.
HPE demonstrates its latest liquid-cooled Cray EX blades with AMD’s 96-core Epyc 4 CPUs. – Click to enlarge
The exceptions to this rule were Lenovo’s Neptune and HPE Cray’s EX blades. Their systems are purpose-built for liquid cooling and packed with copper tubing, manifold blocks, and cold plates for everything including the CPU, GPU, memory, and NICs.
With this approach, HPE has managed to pack eight of AMD’s 400 W Epyc 4 Genoa CPUs into a 19-inch chassis.
A liquid-cooled Lenovo Neptune server configured with two AMD Genoa CPUs and four Nvidia H100 GPUs. – Click to enlarge
Meanwhile, Lenovo showed off a 1U Neptune system designed to cool a pair of 96-core Epycs and four of Nvidia’s H100 SXM GPUs. Depending on the implementation, manufacturers claim that their direct liquid-cooled systems can dissipate between 80 and 97 percent of the heat generated by the server.
One of the more exotic liquid cooling technologies to be showcased at SC22 was immersion cooling, which has come back into vogue in recent years. These systems can absorb 100 percent of the heat generated by the system.
Instead of retrofitting the server with cold plates, immerse immersion cooling tanks like this one from Submer in non-conductive liquid – click to enlarge
As crazy as it sounds, we’ve been submerging computer components in non-conductive liquids to keep them cool for decades. One of the most famous systems with immersion cooling was the supercomputer Cray 2.
While the fluids used in these systems vary from vendor to vendor, synthetic oils from Exxon or Castrol or specialty refrigerants from 3M are not uncommon.
Submer was one of several immersion cooling companies showcasing their technology at SC22 this week. The company’s SmartPods look a bit like you’re filling a chest freezer filled with oil and plugging in servers vertically from above.
Submer offers multiple sizes of tanks that roughly correspond to traditional half- and full-size racks. These tanks are rated for 50-100 kW of heat dissipation, putting them on par with rack-mounted air and liquid cooling infrastructure in terms of power density.
Submer’s tank supports OCP OpenRack form factors such as these three blade Intel Xeon systems – click to enlarge
The demo tank had three 21-inch servers, each with three 2-socket Intel Sapphire Rapids blades, and a standard 2U AMD system that had been rebuilt for use in his tanks.
However, we’re told that the number of modifications required, particularly to the OCP chassis, is fairly negligible, with the only real changes being swapping out moving parts of things like power supplies.
As you might expect, immersion cooling is more difficult to maintain and a lot messier than air or direct liquid cooling.
Iceotope’s spin-on immersion cooling uses the server case as a reservoir. – Click to enlarge
Not every submersible cooling setup on the show floor requires gallons of specialty liquids. An example of this was Iceotope’s in-chassis immersion cooling system. The company’s sealed server case acts as a reservoir, with the motherboard submerged in a few millimeters of liquid.
A redundant pump on the back of the server circulates oil to hotspots such as the CPU, GPUs, and memory before passing the hot fluids through a heat exchanger. There, the heat is transferred to a facility water system or rack-scale coolant distribution units (CDU).
Whether you use direct-to-chip or immersion cooling, both systems require additional infrastructure to extract and dissipate the heat. For direct liquid-cooled setups, this can include manifolds, rack-level piping, and most importantly, one or more CDUs.
Large rack-sized CDUs can be used to cool an entire row of server cabinets. For example, Cooltera showed several large CDUs that can supply a data center with up to 600 kW of cooling. For smaller deployments, a rack-mounted CDU can also be used. We looked at two examples from Supermicro and Cooltera that offer between 80 and 100 kW of cooling capacity.
A rack mounted coolant distribution unit from Cooltera – click to enlarge
These CDUs consist of three main components: a heat exchanger, redundant pumps to circulate the coolant through the racks, and a filtration system to prevent particles from clogging critical components such as the micro-fins of the cold plate.
How the heat is actually extracted from the coolant system is highly dependent on the type of heat exchanger used. Liquid-to-air heat exchangers are among the simplest because they require the fewest modifications to the equipment itself. The Cooltera CDU, pictured here, uses large radiators to transfer the heat trapped by the liquid into the hot data center aisle.
In addition to pumps and filtration, this Cooltera CDU has an integrated liquid-to-air heat exchanger. – Click to enlarge
However, the majority of the CDUs we saw at SC22 used liquid-to-liquid heat exchangers. The idea here is to use a separate plant-wide water system to transport the heat collected from multiple CDUs to dry coolers on the outside of the building, where it is released into the air. Instead of releasing the heat into the atmosphere, some data centers, like Microsoft’s latest facility in Helsinki, have connected their water systems to district heating systems.
With immersion cooling, the situation is largely the same, although many of the components of the CDU, such as pumps, liquid-to-liquid heat exchangers, and filtration systems, are built into the tanks. All that is really required is connection to the facility’s water system.
The adoption of liquid cooling is increasing
While liquid cooling represents a fraction of data center thermal management expenditures today, hotter components and higher rack power densities are beginning to drive adoption of the technology.
According to a recent report by Dell’Oro Group, spending on liquid and immersion cooling equipment is expected to reach $1.1 billion by 2026, or 19 percent of thermal management spending.
Meanwhile, rising energy prices and an increasing emphasis on sustainability make liquid cooling attractive at other levels. Aside from the practicality of air-cooling a 3kW server, 30 to 40 percent of a data center’s energy consumption can be attributed to the air conditioning and ventilation equipment required to keep the systems at operating temperatures.
So while server vendors have found ways to cool servers up to 10kW, in the case of Nvidia’s DGX H100, at these power and heat densities, there are external incentives to reduce the power consumption now used for computers. ®
https://www.theregister.com/2022/11/19/liquid_cooling_sc22/ Fluid and immersion is the new cool at Supercomputing ’22 • The Register