Cloudflare to Deploy NVIDIA GPUs on Edge, Partners with Microsoft and Hugging Face

by Cindy Tan

Published: September 27, 2023 at 10:24 am Updated: September 28, 2023 at 6:07 am

by Victor Dey

Edited and fact-checked: September 27, 2023 at 10:24 am

In Brief

Cloudflare announced that it will deploy NVIDIA GPUs to provide customers with access to local compute power.

The company also announced AI over cloud partnerships with Microsoft and Hugging Face

Cloudflare to Deploy NVIDIA GPUs on Edge, Partners with Microsoft and Hugging Face

Cloudflare today announced that it will deploy NVIDIA GPUs on the edge, featuring NVIDIA’s full stack inference software — including NVIDIA TensorRT-LLM and NVIDIA Triton Inference server.

The company aims to accelerate the performance of AI applications, including large language models. From today, all Cloudflare customers can access local computing power to deliver AI applications and services. Additionally, the company will be offering pay-as-you-go compute power at scale for the first time, eliminating the need for businesses to invest massive funds upfront.

With the increased demand for GPUs driven by the development of AI applications, Cloudflare aims to make generative AI inferencing accessible globally.

Through NVIDIA GPUs in its global edge network, Cloudflare will now offer low-latency generative AI experiences for end users. The company said that these GPUs will be accessible for inference tasks in over 100 cities by the end of 2023 and across its network by the end of 2024.

“We’ve already secured all the GPUs that we need in order to complete the build out through the end of 2023 and are confident in our ability to continue to secure GPUs after that,” Matthew Prince, Cloudflare’s co-founder & CEO, told Metaverse Post.

Futhermore, Cloudflare said that the GPU deployment will provide customers with access to compute power situated near their data. This proximity ensures data handling aligns with regional and global regulations.

“Having control over where inference is run can help with data sovereignty, to make sure that user requests always abide by regulations like GDPR and ensure that data stays within locales,” said Prince.

AI Partnership with Microsoft

Cloudflare also today announced a partnership with Microsoft. While its deployment of NVIDIA GPUs is designed to bring customers’ data closer to computational power, its partnership with Microsoft aims to streamlines AI operations by enabling location flexibility.

Cloudflare said that this collaboration will enable businesses to deploy AI models across a continuum encompassing devices, network edges and cloud environments, optimizing both centralized and distributed computing models.

Utilizing ONNX Runtime across these three tiers, Cloudflare and Microsoft aim to ensure that AI models run wherever it’s most efficient within this architecture.

AI model training demands substantial computational and storage resources, favoring centralized cloud platforms due to their proximity. In contrast, inference tasks will shift toward more distributed locations, including devices and edge networks, while training remains centralized.

The company asserts that it can provide the infrastructure to direct traffic across different environments, based on factors such as connectivity, latency, compliance and more.

As a result, businesses will be able to optimize the location for AI tasks, deploying AI inference where it aligns best with achieving their desired outcomes. For instance, a security camera system can leverage edge networks for object detection, overcoming device limitations without the latency associated with sending data to a central server for processing.

Additionally, organizations will be able to adapt to changing needs by running models in all three locations—devices, edge networks, and the cloud—and making adjustments or fallbacks based on factors such as availability, use case, and latency requirements. This adaptability ensures that AI operations remain responsive and effective in evolving circumstances.

Moreover, Cloudflare said it will offer a streamlined deployment process, enabling businesses to access easily deployable models and machine learning tools through Microsoft Azure Machine Learning on Workers AI.

“As companies explore the best way to harness the power of generative AI in unique ways to meet their needs, the ability to run AI models anywhere is paramount,” said Rashmi Misra, GM of Data, AI, & Emerging Technologies at Microsoft.

The First Serverless GPU Partner of Hugging Face

Alongside the announcement of the collaboration with Microsoft, Cloudflare unveiled a partnership with Hugging Face. Through the partnership, Cloudflare will become the first serverless GPU partner for deploying Hugging Face models.

This aims to enable developers to deploy AI worldwide, without infrastructure management or paying for unused compute capacity.

“Small companies have several challenges when trying to create new AI applications. One of those challenges is scarcity of GPUs across the globe,” said Cloudfare CEO Matthew Prince.

“We think a serverless, multi-tenant model is necessary to support companies of any size and allow them to pay for exactly what they use. We don’t want large companies reserving GPUs and monopolizing the AI inference market.”

The company said that Hugging Face’s most popular models will integrate into Cloudflare’s model catalog and be optimized for its global network. This integration makes the most popular models accessible for developers worldwide.

Developers will also be able to deploy Workers AI with a single click directly from Hugging Face. This streamlined process empowers developers to focus on coding and AI application development.

“Hugging Face and Cloudflare both share a deep focus on making the latest AI innovations as accessible and affordable as possible for AI builders,” said Clem Delangue, CEO, Hugging Face. “We’re excited to offer serverless GPU services in partnership with Cloudflare to help developers scale their AI apps from zero to global, with no need to wrangle infrastructure or predict the future needs of their application — just pick your model and deploy.”

Tags:

Disclaimer

In line with the Trust Project guidelines, please note that the information provided on this page is not intended to be and should not be interpreted as legal, tax, investment, financial, or any other form of advice. It is important to only invest what you can afford to lose and to seek independent financial advice if you have any doubts. For further information, we suggest referring to the terms and conditions as well as the help and support pages provided by the issuer or advertiser. MetaversePost is committed to accurate, unbiased reporting, but market conditions are subject to change without notice.

About The Author

Cindy is a journalist at Metaverse Post, covering topics related to web3, NFT, metaverse and AI, with a focus on interviews with Web3 industry players. She has spoken to over 30 C-level execs and counting, bringing their valuable insights to readers. Originally from Singapore, Cindy is now based in Tbilisi, Georgia. She holds a Bachelor's degree in Communications & Media Studies from the University of South Australia and has a decade of experience in journalism and writing. Get in touch with her via [email protected] with press pitches, announcements and interview opportunities.

Cindy Tan