Liquid Cooling for AI and Machine Learning

Data centers worldwide are pushing to increase energy efficiency, minimize costs and consolidate operations in order to provide more computational power at lower cost. As Artificial Intelligence (AI) and Machine Learning (ML) algorithms become more prominent in high-performance computing systems, data centers must meet the increased power demands and thermal challenges that come with processing exponentially growing data sets in less time. Embracing high performance liquid cooling as a thermal management tool is a critical path to accomplish this goal.

Thermal Management for Artificial Intelligence (AI)

Today’s custom silicon chips executing Machine Learning functions require highly effective thermal management to perform at their designed computing spec. AI systems face challenges of processing vast data sets and the I/O latency associated with neural network (NN) operations with this data.  More power is required for faster computing and the cooling resources to keep chips functioning at reliable temperatures can be limited at both a chip-level and facility scale.  For this reason, the heat exchangers responsible for removing the thermal power from these chips must be as effective as possible to keep critical AI board components running optimally.

AI Hardware Development

The accelerating demand for faster computing in the AI market has resulted in a wide array of new silicon layouts and form factors, from SoC and Chiplets to Wafer Scale silicon. Each new design brings the need for increased processing power, which leads to more heat generation and thus higher silicon temperatures that adversely impact processor speed and reliability.

For the newest silicon designs to perform as desired, a thermal management solution is needed that can dissipate high heat fluxes (Watts/cm2) with minimal temperature rise. Liquid cooling can dramatically improve the functionality of high power chips because of the higher heat capacity of liquids, and Microchannel liquid cooling can lower temperatures even further than standard liquid cooling designs.  Mikros’ microchannel cooling designs are among the world’s most efficient methods to remove heat from an AI or Machine Learning system.

Liquid Cooling for AI Chips

Liquid cooling has become critically necessary in thermal management for AI chips because the higher specific heat of fluids can remove the heat densities generated by today’s chips. At Mikros Technologies, we offer state-of-the-art microchannel liquid cooling solutions that remove unwanted heat from AI chips with the lowest system impact, temperature rise and pressure drop.  Our microchannel cold plates offer orders-of-magnitude lower thermal resistance than other designs.

The advantages of a Mikros Technologies thermal management solution include:

  • Highest Cooling Capacity: The ultra-low thermal resistance of Mikros cold plates–less than 0.02 C-cm2/W–means that more power can be dissipated with a very low-temperature rise of the silicon junction. That means that a Mikros cold plate can dissipate over 1 kW/cm2 with as little as a 20C temperature rise at the chip at low coolant flow rates.
  • Low Pressure Drop: Microchannel liquid cooling has historically been associated with high added-pressure due to the micro-sized flow channels.  Mikros’ patented designs eliminate high pressure drops with proprietary matrix configurations that distribute coolant flow equally across a silicon surface.
  • Normal Flow Heat Transfer: Mikros cooling orients coolant flow “Normal”, or perpendicular, to a heated surface, increasing heat transfer by several factors over “parallel flow.”  The result is more heat transferred per-coolant-volume, resulting in lower required flow rates, higher needed inlet temperatures, smaller pumps and other peripheral components.
  • Tailored Cooling: Normal flow cooling also allows our design team to orient more coolant flow to higher power areas of a chip, minimizing temperature gradients and energy usage.  Mikros also incorporates our microchannels into assemblies with other lower-power cooling designs, such as machined “mini-channels” and swaged tube cold plates, to remove heat from other components with the extra thermal budget saved by our microchannel designs.

Liquid Cooling Versus Air Cooling Systems

Historically, semiconductors have been cooled with fans producing airflow over pin-finned heat sinks attached to the chip lid. Air cooling is still a very economical option for lower power chips, but it has limitations. At current AI chip power densities, the heat carrying capacity of air, the convection coefficients of heat sinks and the power of the most efficient fans are not large enough to dissipate waste heat fast enough, resulting in temperature spikes that cause slower or failing processors.

The higher heat capacity of water, glycols and other liquid coolants provides much higher cooling capacity per unit coolant flow.  Still, the heat transfer mechanism designed into the liquid cold plate used is critical to efficient heat transfer and chip performance.  Along the liquid-cooling spectrum, Mikros microchannel cold plates deliver 1-2 orders of magnitude (10x-100x) higher cooling capacities for lower cost and input than other designs.

Liquid Cooling Advantages

There are many benefits of Mikros microchannel liquid cooling for AI systems, including:

  • Low thermal resistance.
  • High heat flux capacity.
  • High thermal effectiveness.
  • Customizable Cooling Areas.
  • Space-saving capabilities.
  • Environmental sealing.
  • Easy maintenance and installation.
  • Low noise levels.

Microchannel Cold Plates at Mikros Technologies

For high-efficiency AI liquid cooling systems, Mikros Technologies continues to design and produce best-in-class thermal management solutions. We meet custom design requirements based on demands for performance, space, low-pressure drop and thermal resistance, so you can receive solutions that are tailored to your needs. Our microchannel liquid cold plates provide industry-leading performance, offering high cooling effectiveness for AI and ML systems.

Contact us today for more information on our thermal management systems.