Tech

Is Nvidia’s $2 Trillion Empire Built on Sand? Why Custom Chips Threaten Nvidia’s Dominance

Nvidia, which earned $12.3 billion a quarter, is now the envy of many semiconductor companies. For the first time, we found that GPU profits are so high that they can even support a market value of $2 trillion, but will Nvidia really be satisfied with this?

Nvidia CEO Huang Renxun once made a speech in 2008, saying that the company should put studying customer needs and solving customer problems first, rather than focusing on competitors. If you focus on how to snatch customers from competitors, you will miss the opportunity to develop new customers.

Sixteen years later, Nvidia’s CEO is still Huang Renxun. Although the market value of its stock price has already increased by dozens or even hundreds of times, Nvidia is still on the road of constantly looking for new customers under his stewardship.

According to Reuters, Nvidia is building a new business unit focused on designing custom chips for cloud computing companies and others, including advanced artificial intelligence processors.

According to its revelations, Nvidia executives have met with representatives of Amazon, Meta, Microsoft, Google and OpenAI to discuss the production of custom chips for them. In addition to data center chips, Nvidia is also seeking customers in the telecommunications, automotive and video game industries.

The Reuters report means that NVIDIA will enter the data center custom chip market with a strong posture and open up a new battlefield after traditional games and emerging artificial intelligence.

So why did nvidia do it, and what were its odds?

Custom Twin

Starting in 2020, self-research and customization have become hot words in the semiconductor industry. Since Apple released the M1 chip, it seems that every manufacturer is trying to develop its own chip to gain a cost advantage.

But for Hyperscaler companies, they have nearly complete control over hardware and software, which is very suitable for developing their own SoCs, and their research in this area is indeed much earlier. The biggest driving factor in the early stage of customized chips is from these enterprises. The huge demand for AI and cloud computing.

Google, which became famous with AlphaGo in the early years, was the forerunner of customized chips.

In 2013, Jeff Dean, head of Google AI, calculated that if 100 million Android users use 3 minutes of mobile voice-to-text services every day, the computing power consumed is twice that of all Google data centers, and there are far more than 100 million Android users worldwide.

At this point Google has realized that general-purpose CPUs and GPUs alone can no longer meet the huge computing needs of the future, and the way out is to choose custom chips. To this end, it has set a goal: to build a Domain-specific computing architecture for machine learning, and to reduce the total cost of ownership (TCO) of deep neural network reasoning to one tenth of the original.

At the Google I/O Developers Conference in 2016, Google CEO Sundar Pichai officially showed the TPU to the world. The original TPU was manufactured by 28 nm technology, running at 700MHz and running at 40W. Google packaged the processor into an external accelerator card and installed it in a SATA hard disk slot for plug and play. TPU connects to host via PCIe Gen 3×16 bus, providing 12.5GB/s effective bandwidth.

But the first generation TPU is not Google’s own independent creation, behind it, inseparable from the help of Broadcom.

According to a report by JP Morgan analyst Harlan Sur in 2020, Google TPU v1 to v4 generations were designed jointly with Broadcom, when it had already started producing TPU v4 with 7nm process and began working with Google to design TPU v5 with 5nm process.

Sur said Broadcom’s application-specific integrated circuit (ASIC) business had full-year revenue of $750 million in 2020, up from $50 million in 2016. In addition to chip design, Broadcom also provided key intellectual property to Google and was responsible for manufacturing, testing and packaging new chips to supply Google’s new data center. Broadcom also worked with other customers such as Meta, Microsoft and AT&T to design ASIC chips.

The analyst also said in May 2022 that Meta is using custom chips to build its Metaverse hardware and become Broadcom’s next multi-billion dollar ASIC customer.”We believe that these achievements will focus on 5 nm and 3 nm processes and will be used to support Metaverse hardware architecture, which will be deployed in the next few years. Meta will become Broadcom’s next billion-dollar ASIC customer after Google in the next three to four years.” Sur said.

Before the arrival of artificial intelligence in the first year, Broadcom and Google and Meta shoulder to shoulder, greatly expanded its share in the data center chip market, and after the explosion of artificial intelligence in 2023, Microsoft launched Maia 100 chip, and its network card still under development, may have Broadcom behind the participation, relying on these giants, Broadcom became one person below ten thousand AI winners.

Broadcom’s latest earnings report also reflects this point, its first quarter of 2024 earnings report shows, This quarter semiconductor revenue 7.39 billion US dollars, Year-on-year growth of 4%, Revenue accounted for 62%, Among them network revenue 3.3 billion US dollars, Year-on-year growth of 46%, Accounting for 45% of semiconductor revenue, Mainly driven by the growth of two major customer custom DPU chips, It is expected that 2024 network revenue growth of 35%+.

It is worth mentioning that AI-related business, Broadcom AI ASIC and AI-focused network solutions together classified as AI accelerator, as of 2023, the total sales of this business accounted for 15% of the annual semiconductor revenue, or about 4.2 billion US dollars, and in the first quarter of 2024 AI revenue of about 2.3 billion US dollars, accounting for 31% of semiconductor revenue, compared with the same period of the previous year quadrupled, is expected to account for more than 35% in 2024 This means that AI revenue will exceed US $10 billion in 2024 (previously estimated at US $7.5 billion), which is expected to increase by about 133% year-on-year.

Among the targets of more than $10 billion in revenue, custom DPU chips are about $7 billion, 20% are switch/router chips, and 10% are optical chips and interconnect chips, which means that just custom chips for giants like Google, Meta and Microsoft can earn a lot of money.

Broadcom CEO Hock Tan did not hide his optimism about AI and custom chips. He said on the earnings conference call: By fiscal year 2024, network revenue will grow by 30% year-on-year, mainly due to the acceleration of network connection deployment and the expansion of artificial intelligence accelerators in ultra-large enterprises. Revenue from generative AI is expected to account for more than 25% of semiconductor revenue.

Compared with Broadcom, which ranks first, Marvell is slightly smaller in scale on custom chips, but it also has strength that cannot be underestimated. In April 2023, Marvell released a data center chip based on TSMC’s 3-nanometer process, which is also the world’s first 3-nanometer chip released in the name of a chip design company.

In June 2023, Taiwan media Freedom Times reported that Marvell had obtained Amazon AI orders. Through this collaboration, Marvell will assist in the design of Amazon’s second generation AI chip (Trainium 2), which is expected to start commissioned design in the second half of 2023 and enter mass production in 2024.

As early as December 2020, Amazon launched a new machine learning custom training chip Trainium, which promised a 30% throughput increase and a 45% reduction in single-quote costs compared to standard AWS GPU instances, followed by an upgraded version of Trainium 2 in November 2023, behind which Marvell’s shadow was also launched.

On Marvell’s website, it is more straightforward to mention that it is a strategic supplier of AWS, providing cloud-optimized chips to help meet the infrastructure needs of AWS customers, including providing electronic optics, networking, security, storage and custom design solutions, considering that Amazon is currently the world’s largest cloud service provider, and Google intends to move from Broadcom to Marvell.

In Marvell’s fiscal fourth quarter and annual financial report for the year ended February 3, 2024, revenue for the fourth quarter of fiscal 2024 was $1.427 billion, exceeding mid-year expectations. Matt Murphy, Chairman and CEO of Marvell, highlighted this: “Our Marvell fourth quarter fiscal 2024 revenue exceeded expectations by $1.427 billion. The revenue growth from AI was even more impressive, driving revenue growth of 38% quarter-over-quarter and 54% year-over-year in our data center end markets.”

Interestingly, in its previous third-quarter 2024 earnings report, Marvell announced that it had also grown in cloud computing by developing custom chips for cloud vendors. “Cloud customers are still focused on enhancing their AI products by building their own custom compute solutions, and we’ve won many of those designs,” CEO Murphy said.

While it’s unclear which giants Marvell has partnered with, Amazon must be among them. From Marvell’s acquisition of Cavium in 2017, Aquantia, the ASIC business unit of Globalfoundries in 2019, Inphi, an optical chip maker, in 2020, and Innovium, a network switching chip maker, in 2022, Marvell’s plot is still quite large.

In addition, Marvell’s bet is more critical than Broadcom’s reliance on giants, as early as September 2020, Marvell helped this year’s explosive Groq design produce Groq Node, where Marvell provides the building blocks for building ASICs and their interfaces with the outside world, while Groq itself focuses on artificial intelligence acceleration.

Broadcom and Marvell are enough to be called custom duos in the era of artificial intelligence.

Who’s the opponent?

Although Broadcom and Marvell have not received the same attention as Nvidia, due to the drag of non-AI business, the current earnings report is not good enough, and the stock price is difficult to compare with Nvidia, but the vast market hidden behind them is enough to make Nvidia look sideways.

“Hyperscale data center companies make their own chips cheaper than buying them in bulk and eliminate the cost of middlemen.” VP Kam Kittrell, digital and signing division of Cadence, an EDA tools and intellectual property giant, believes that “these companies are usually users of their own cloud services and have high-value specialized software.” They can build more energy-efficient hardware specifically for this software.”

“We see the biggest growth coming from the data infrastructure area, including cloud, data center, network, storage equipment, and 5G infrastructure applications.” Sudhir Mallya, head of marketing at Alphawave Semi, said: “Today, the growth of custom chips in data center infrastructure applications is amazing. We’ve seen this trend since a few years ago when hyperscale data center companies like Google, Microsoft, AWS, Meta, etc. all started designing their own chips.”

Alan Weckel of research firm 650 Group estimates that the market for custom chips in data centers will grow to $10 billion this year and double by 2025. Needham analyst Charles Shi said that by 2023, the broader custom chip market is worth about $30 billion, accounting for about 5% of global chip sales annually.

“Broadcom’s custom chip business is $10 billion, and Marvell’s is about $2 billion, which is a real threat,” said Dylan Patel, founder of chip research group SemiAnalysis. “It’s a real big negative-there are more competitors coming into the fray.”

Interestingly, NVIDIA CEO Huang Renxun also mentioned this broad market in his recent speech at Stanford, saying that NVIDIA not only has competition from competitors, but also competition from customers (cloud service vendors), customers can build a good chip (ASIC) for a specific algorithm, but computing is not just about transformer, let alone NVIDIA is constantly inventing new transformer variants.

Huang highlighted cost, saying that people who buy and sell chips think only about the price of the chip, while people who run data centers think about overall operating costs, deployment time, performance, utilization and flexibility in all these different applications. Overall, Nvidia’s TCO is so good that even if competitors ‘chips are free, it doesn’t end up being cheap enough, and Nvidia’s goal is to add so much value that alternatives aren’t just about cost.

As the actual controller of Nvidia, he first expressed disdain for the current ASIC chip, and then he said that Nvidia can always use the existing IP and technology accumulation to create better custom chips for customers as long as necessary, which is consistent with the Reuters report in the previous paragraph.

“Are we willing to customize? Yes, we do. Why is the threshold now relatively high? Because each generation of our platform has GPUs first, CPUs, network processors, software, and two types of switches.

I built five chips for a generation of products, and people thought there was only one GPU chip, but there were actually five different chips, and each chip cost hundreds of millions of dollars to develop, just to get to what we call a ‘release’ standard, and then you had to integrate them into a system, and then you needed networking equipment, transceivers, fiber optics, and a lot of software.

Running a computer the size of this room requires a lot of software, so it’s all complicated. If the customization requirements vary too much, you must repeat the entire development process. However, if customization can take everything that exists and add something to it, then it makes a lot of sense.

Maybe it’s a proprietary security system, maybe it’s a cryptographic computing system, maybe it’s a new way of processing numbers, and more, and we’re very open to that. Our clients know I’m willing to do all of these things, and realize that if you change too much, you basically reset them all, wasting nearly a hundred billion dollars. So they want to make the most of that (reducing replacement costs) in our ecosystem.”

In fact, the two next-door companies are already innovating on data center chips. AMD has Instinct compute GPUs and EPYC processors (in a chiplet design) to address AI and HPC workloads, while Intel has a multi-faceted strategy with a single-chip Habana processor for AI applications, a multi-chip Data Center GPU Max for AI and HPC applications, and a multi-chip 4th generation Xeon scalable CPU for the rest.

The Nvidia H100, which is still designed in a single unit, is still competitive in AI, but this advantage is not impossible to eliminate, especially considering its expensive price. Cloud service giants such as Amazon, Google, Meta and Microsoft have both the financial resources to develop customized data center chips and the technical ability to design a set of supporting software for them. In order to improve efficiency and reduce costs, they are developing in this direction. The arrival of AI only accelerates the process of migrating to customized chips.

When cloud service providers take the lead in turning to custom chips, the chips produced will not only serve themselves, but also be open to other manufacturers. In the long run, the proportion of custom chips will be higher and higher, and the proportion of NVIDIA chips will be smaller and smaller. The 2 trillion empire it has built today may collapse in an instant.

In Huang Renxun’s speech, on the one hand, he emphasized the cost and expressed the meaning of “customized chips are very good, but it is still more cost-effective to calculate my chips”. On the other hand, he also held an open attitude towards customized chips. First, appease his agitated customers. Is the cost high? Give you a discount, customize? Ask your needs first. Don’t worry about any problems. Let’s sit down and talk slowly.

This attitude actually reflects Nvidia’s dilemma today. Remember Huang Renxun’s remarks mentioned at the beginning of the article? Nvidia puts customer needs first rather than competitors, which has allowed Nvidia to dominate the GPU market for more than 20 years without losing, but when customers become rivals, it is somewhat embarrassing.

From this point of view, Broadcom and Marvell don’t have to worry too much about Nvidia’s cross-cutting. Once Nvidia starts the pioneer of customized chips, it will lead to a conflict between Nvidia B100 and customized chips, which is better, and what kind of performance differences different customized chips have, etc., which are all issues Huang Renxun needs to consider in the future.

Written in the end

Nvidia now covers a wide range of fields, not compared with the past, Nvidia in 2008 may only need to stare at AMD and Intel on the line, but Nvidia in 2024, the number of manufacturers to stare at has already doubled several times, and are not idle generation.

In its recent filing with the Securities and Exchange Commission, it also put a bunch of competitors, Intel, AMD, Broadcom, Qualcomm, Amazon and Microsoft among them. In the face of the aggressiveness of several giants, Nvidia was calm again, and some sweat appeared on its forehead, which was not as clear as before.

Perhaps Nvidia really needs to think about it now and not regard defeating its opponents as its goal.

error: Content is protected !!