Transcripts

NVIDIA Corporation (NVDA) Rosenblatt's Annual Technology Summit - The age of AI scaling (Part I) Conference (Transcript)

Jun. 07, 2023 4:50 PM ETNVIDIA Corporation (NVDA)

137.98K Followers

NVIDIA Corporation (NASDAQ:NVDA) Rosenblatt's Annual Technology Summit - The age of AI scaling (Part I) Conference Call June 7, 2023 12:00 PM ET

Company Participants

Simona Jankowski - Investor Relations

Ian Buck - Vice President and General Manager of Hyperscale and HPC NVIDIA

Conference Call Participants

Hans Mosesmann - Rosenblatt Securities

Hans Mosesmann

Good morning. Good afternoon, everybody. Thank you for joining us for the NVIDIA fireside. Before we get started, Simona Jankowski is going to read us disclosures, and then we'll jump in. Simona?

Simona Jankowski

Yes. Good morning, and thank you very much, Hans, for hosting us. I just wanted to quickly remind the audience that our comments today may contain forward-looking statements, and investors are advised to read our reports filed with the SEC for information that relates to the risks and uncertainties facing our business.

Back over to you.

Hans Mosesmann

Simona, thanks. Well, we are delighted to have Ian Buck. It's been a couple of years. We couldn't talk last year. There were some conflicts, but Ian is a veteran. He runs everything that has to do with accelerated computing at NVIDIA, which is what the interest level is all about these days as it relates to AI. It includes all hardware, all software, all third-party enablement and marketing activities.

Ian is known for being basically, I'll call him, father at CUDA, which has led to having a formidable moat around NVIDIA's business in terms of compiler technology, acceleration libraries, framework optimizations and so on. So there couldn't be a better person, I think, to talk about what's happening in the world of AI and how NVIDIA is playing its part in all of this. So Ian, welcome. How are you?

Ian Buck

Pretty busy, as you can imagine. So yes, it's been an amazing journey and an amazing last few years since we've talked, actually, last six months, even more exciting. Definitely writing that exponential. So we're cooking here. Yes.

Hans Mosesmann

Great. How about if we just start. You've been at NVIDIA for 20 years, almost 20 years, and it's gotten even more intense in terms of what's going on. What's the state of AI today? I know that there's lots of discussions on ChatGPT, generative AI. Just briefly, what's the state of AI in terms of NVIDIA's view and how you guys are participating in? It's an open-ended question, but I think we'd start off with that, and we can go from there.

Ian Buck

Yes. Like I said, it's been a 20-year journey so far in accelerated computing as a whole. And the – it started with making our GPUs more programmable, launching CUDA in 2006 and investing in that ecosystem since then. Internally within NVIDIA, building up a software foundation platform for accelerated computing as well as working with everybody in the ecosystem, everybody in the ecosystem to enable GPUs as a computing platform. That, of course, is a broad goal. There were certain markets that in high-performance computing, simulation and others that adopted first, and that's been broadening since.

And of course, since 2012, AI, [we didn't have AI]. AI found us. But because of that activation and making it everywhere and putting it over GPU, those researchers up in Canada were able to find and realize that this thing they're working on their own networks with Hinton and others. Turns out the math was pretty good fit for what we're doing in CUDA. So it took off from there.

There's been a couple of inflection points along the way for AI. The first one, which was basically the initial work by Alex Krizhevsky up in Canada was all the way back in 2012 in the initial ImageNet Competition. That was the starting point. And really, that was AI for basic image recognition. What is this a picture of? A beach ball, a stop sign, a birthday party. That image recognition expanded into other forms of what is this a statement of. Are there images? A sentiment? So taking a review on our web page or a tweet and understanding if it's a positive or negative sentiment to understand damn content.

That described the initial – that concept of AI, that use of AI was the initial production ramp that we saw. And we probably all remember the Jeff Dean talking about finding cats in videos as an example. And obviously, for the hyperscalers and the cloud providers and social media and the Internet needed to understand their content. It is the first place where they can really turn their data at AI to understand what people were posting, reviews, products, et cetera.

The shift, the next shift in AI, those became more and more capable. And then along the way, AI shifted from a recognition problem to a generative one, being able to not just understand content, be it text, speech, video, whatever, but be able to generate meaningful content to create content, to create a product description that would – want users to click on a link or – and that started small and really hit it with actually BERT, if you remember the original BERT model, which was the first transformer-based model. It was its ability to not just understand text, but produce simple text statements and generate text.

In fact NVIDIA was one of the – we had a birthday, if you remember. In fact, it was – got notice even in the markets to identify this idea of this new kind of neural network called the transformer that could understand text. Prior to that, most applications they have were convolution-based. They were basically looking at neighborhoods of information and building up an understanding from localized data. This makes sense in image recognition to recognize the face. You first recognize individual shapes in certain places. My face has two circles, a nose and a line for mouth. And you build up the notion of a face, which is localized.

Language is different. Language has all sorts of indirections. What I am speaking about right now is filled with pronouns and context that is only known for other parts of the text or speech that are far away from what I'm saying right now to understand it.

Transformers were based around this idea of attention, figuring out those distant relationships and incorporating them into a neural network. It started with BERT invented by Google and then took off from there. And we all heard of GPT. The T and GPT is transformer. It's the same idea. NVIDIA was obviously swarming convolutions, image recognition, CNNs. Now we focus – we still do that and is still a growing use case is an important one. But now transformers has taken over, including in some video, to understand distance relationships and the primary use case for that speech and human understanding.

Speech and language are a hard problem. If you think about computer vision, like dogs, cats, even bugs could do basic computer vision from a brand perspective. You can find it highly tuned to do a reasonable job. And only really humans have the gift of language, and it's built upon the deep understanding of knowledge. That was the next inflection point, transformers, which is our sort of understanding knowledge and being able to connect knowledge with language, still mostly understanding a little bit of generation.

Today, we're in the generative AI. It started with – two areas kicked it off. One is obviously image generation. Being able to describe it like a picture of a teddy bear swimming in a big lab, and it generates a picture of that. Generative AI or ChatGPT, being able to have a conversation, understanding what I'm saying and repeating back and extracting information. There's no database back there. It's one large neural network. And with generative AI, we not only can make – we've moved from an era of recognition or recognition only to – which is important. I can pick that data and understand what my content and make decisions based on that – AI is informing me. But now AI itself can provide the content, provide texts, engage with customers, generating images, help artists optimize business, build new applications, build new kinds of services.

What's interesting also, though, unlike other revolutions of – like in the PC space or the mobile space, you guys all have seen the new kinds of applications, new kinds of platforms, new kinds of software. This one, generative AI is actually making the old stuff more interesting. You look at Office 365, nothing could be – I should – probably doesn't like me saying this. But Microsoft, this is the more – Excel isn't that interesting words. It wasn't that interesting. But with generative AI, wow, it's way interesting again.

So in this revolution, we're seeing generative AI create new start-ups, new kinds of services. But it's also making the old stuff super interesting in me too, which is a fund double exponential. So that's where we're at. I think we're really at that cusp of the really beginning of that generative AI. Everyone sees the opportunities you guys do. The market does the VC community, investor community does. And you see the amazing starts that are being created. That's what makes this super fun right now, is seeing all the different applications of the new services and old that are getting amplified and changed with AI.

Hans Mosesmann

Hey, Ian, the compute, I think a lot of people in the industry or observers talk about AI and the parameter complexity doubling every three, four, five months. What does that – how much longer do we have for that? And what are the compute implications for NVIDIA and for the industry for GPUs or custom ASICs or even….

Ian Buck

Yes. That's a great question. It's one I get asked a lot. So first off, to do generative AI, you have to – the AI has to have knowledge. And not like access to knowledge, certainly having access to knowledge like a database, so that they can query or pull from providers is important, and most do. But the reason why GPT is so big or the Megatron 530 billion model that we trained on our supercomputer is so large, it has to capture humanology at some level in order to be a starting point for a generative model. So that drives the model size up.

The other thing that drives up that it's not just model size – well, first off, model size tends to limit the – the bigger the model longer takes the train, in fact. And people build models to limit, that's practical. So they don't want to wait. And it's not just one train job. To build a model at that scale, you are constantly iterating on that model, on the data, on the tuning of the parameters to get to converge to a level of intelligence. There's a lot of AI, the training drops that don't complete, but informed to inform the next one. So it takes many months or years to build a truly intelligent model. So the final change is an eventual convergence of that effort.

The other thing that drives – the challenge tends to be training time. Nobody wants to train more than most a month or two. I think it's – you go past that. The productivity of the – just it's hard to innovate if you're waiting that long. So the size of the model tends to be a factor of how much capacity they can put in place and how productive it is at scale. But there's been a general rule of thumb, like DL researchers, the ones really developing the stuff, you don't start building foundation models, don't really want to train, have training jobs that take more than because they're just too impatient.

So as we make faster GPUs, as we figure out how to connect them faster together with InfiniBand, to build more optimized infrastructure, to do things like Grace Hopper and the new DGX on GH200. Their productivity increases. What they can train in roughly a month, the model gets bigger because it gets more intelligent.

I will say one other thing, though, that model size is only one metric, and model size is just measured in parameters. $175 billion is typical for GPT. People – we've trained 500. There's 1 trillion perimeter models buying closed doors. They're starting to get a little more secretive and not releasing these huge models, which obviously are pre-trained and asset intelligence.

The other thing that's driving is model, the design of the layers. We're continually tuning the intelligence at each layer, making them more optimized, more clever at each layer, which increases the complexity per layer. That isn't always captured in parameters because each layer has a bunch of math and calculations in it instead of just being a naive connections, if you will. Human brains, by the way, similar. We have different kinds of neurons for vision processing versus auditory versus memory. So we specialize in the layer and the design. The same as happens in AI.

The other one is sequence length. So I don't know if you guys have noticed. But the longer you play with something like ChatGPT, you can get it to forget the previous conversation, and it will drift. And that's a function of sequence length. How much of the information of the conversation can keep in its store and its memory is having a conversation. Sequence length increases compute side significantly in terms of – it's also both the training and inference load, which isn't always captured in billions of parameters. It's just how much needs to be processed in order to make an informed conversation moving forward. And then there's more diversification going on. We have lots of different models from PaLM to LLaMA to GPT4. So we see different specialization happening.

I expect moving forward, the models will naturally want to get bigger because they can encapsulate more intelligence. I believe that they – we are definitely seeing that happen. We're seeing them integrate more deeply with intelligence databases and applying AI into the database and information retrieval itself. Vector database is super interesting. I can talk forever about that, but those are being tied directly to some of these large models. And now AI is working into the retrieval systems as well to inform them.

They get their specialization happening. We're seeing multiple different models and specialization of diff layers and sequence length to keep the conversation more intelligent and keep the AI working memory more adept, which is also significantly increasing compute requirements.

It's a chicken and the egg and the salt, which we're trying to help with. I think it's what's driving – every time, we launch a new architecture, a new interconnect technology or do new innovative things in Grace Hopper, DGX GH200, we expand the scope of what these researchers and developers and NVIDIA's own researchers can do in order to move large language models and generative AI moving forward.

The next chapter in that probably will be more about reasoning. What's interesting is we're seeing – right now, we're in generative AI. And I can talk more about reasoning in the future, but that's kind of a inevitably where we're going, and that's an even harder blue sky problem.

Hans Mosesmann

Well, okay. So it looks like we're going to be in a growth pattern here for some time. For those that are listening, investors, participants, if you'd like to ask a question, just click on the question or Ask a Question button on the right of your screen. And it will come to me, and I can read it out for Ian to talk about.

There's a new metric. It's kind of interesting. I was talking to some contacts in Silicon Valley in maybe six months ago or so. The price of Hopper and the DGX Hopper was starting to come out. And it was really, really expensive. And there are some people who are saying, there's no way we're going to pay that kind of price for this kind of system. It's so much more expensive than, say, ampere.

And yet here we are, and you're probably hand-to-mouth for the better part of this year, which kind of brings to mind that the issue for some of these AI models for training enter inference have little to do with the upfront price. So it's less relevant in the really the TCO aspect or the efficiency aspect that comes into play. How does that determine how you come to market, how you architect your compute GPUs and so on?

Ian Buck

Yes. I really appreciate that question, and it's one that gets asked a lot because of this entire community. The world sees pricing and sees sticker shock. And by the way, they usually – they don't realize what it takes to build a hyperscale data center and the cost that goes. These are multibillion-dollar investments that are not new – people are building data centers at scale, and I get to work with all the hyperscalers about that. So the productivity and the utility of compute is incredibly important to them in order for them to improve their service, improve what they're doing, optimize their business and increase their revenue. Compute is critical to add generation, putting the right content in front of you, keeping those engagement scores high or keeping the products you want to provide a service.

Nothing is more annoying than getting useless ads, but getting ones that actually are the things you want and the information you need leads to revenues. It's critical. And while people can see the – maybe see the cost of interest of a GPU, we create the opportunity for all of them to invest and build those services to maybe turn AI into the opportunity that it provides for them based on computing on their data.

Specifically though, when we talk about generation and generation, how we think about introducing new GPU, new technology in the market, it is TCO. It is about how are we revolutionizing not just the compute capability, but also the TCO analysis of what you can do today with our existing products and tomorrow with the next one. Hopper provides 6x more compute performance at the transformer level, implementing that transformer layer than ampere do, 6x.

End-to-end, it's delivering – end-to-end on training, it's delivering 3x to 4x more performance. That's a complete training job that drops its throughput. And inference, even more. Inference which is – it can be further optimized. And then, of course, so when we think about that, we think – and when we look at that more than just the cost of individual node, but what's the throughput of that entire data center is going to be for them based on what they have today and what they're going to be able to do tomorrow. And we save them a ton of money. We save them a ton of money because by transitioning from one generation to the next because the opportunity of performance, and the economic TCO is hugely in their favor in terms of the throughput of the data center and the productivity in the data center. That $1 billion investments, the billions that takes to build those data centers all around the world.

That same story plays out in enterprise as well. So by moving workloads from CPU or from previous generation GPUs to new GPUs. The throughput of the system or the rec or the data center at data center scale is measured in X factors often. Certainly for the model, the transformer-based models, but including – we also look at the breadth of all the different workloads, including the models that are representative in image recognition, benchmarks you see in MLPerf, for example.

MLPerf, if you guys haven't heard of it, it's a benchmark. It's created by Google to sort of provide a level playing field, a clean, clear benchmark. It was representative of their training workloads. And since then, Meta has also been contributing their workloads to provide an honest benchmark that trains us to the correct level of accuracy or convergence as a requirement. And we use that to measure our performance to market based on previous generation. And you can see what Hopper has done compared to Ampere.

The other interesting point is that once we don't stop after we ship it. We continuously invest in the software and optimizations. Software is a massive part of what we do. I myself, I started as a software engineering manager in NVIDIA, doing good, have hired thousands of software engineers and others across the company. And one of the reasons I had this job is because now is because of the importance of software than what we do. And it is our interface to the rest of the world to the people that are consuming our technology, partnering on the frameworks like PyTorch and Jax and TensorFlow and everything else, the rest of the system and the end user community. So NVIDIA at this point, has more software engineers than hardware engineers by a good margin.

And so after we do the first round of benchmarking on something like Hopper, we continuously improve it. And in fact, the Ampere over its life, got, I believe, 2.5, 3x faster from – yes, from the first time – you look at the first time we submitted to the MLPerf benchmarks public to where we – I think we've recently stopped submitting when we shift over Hopper. You can see a 2.5x improvement in some of those models and the use cases. So I mean that's kind of what our users experience. I think it's why we have such a loyal community of users, both in the developer community as well as our biggest customers. It's because we're continuously optimizing the whole stack and the platform, along with them, to improve the TCO.

Hans Mosesmann

Great. Ian, I did get a question here. It's an interesting one. Can you expand on the current issues with scaling sequence length and how that might be solved? There seems to be a push for new architectures that have more favorable scaling functions. Would this be a risk or opportunity for NVIDIA's advantage with its Transformer Engine?

Ian Buck

Yes, good question. So let me elaborate a little bit on that. We want to – working with the customers and the users in the community, you can take a relatively small or a large model. And the larger input sequence length provides more context for the conversation moving forward. As measured in tokens, started with hundreds, now going to thousands, and they want to push it up higher. That increases the compute complexity of the inference job and how you want to tune for training. Scalability is really important there. Also capacity is important. It creates a larger working memory with a larger model. And one of the ways we – there's multiple ways to address that. One is scaling. Obviously, the throughput – so well, there's multiple ways to optimize.

First is Transformer Engine. You mentioned that. What Hopper did that was so revolutionary, was so impactful, was it made something called FP8 viable. FP8 is an 8-bit floating point representation. It's basically is 8, 0s and 1s to represent a floating point number. It's not a lot of information. It's about the number of characters in an alphanumeric keyboard, for example, times 2. So every character you can type on per character. And roughly double that, that's how balance sheet can represent…

But if you can make training work at FP8, it's incredibly fast. Obviously, competing on 8 bits is faster than computing on 16. Also the memory science that's half of what you would have a 60-bps for point, which is what we have before, whichever we were using. The Transformer Engine specifically designed. You can't just put down and expect to cut the number of information. Eightfold, it's exponential eightfold in order things successfully.

Transformer Engine with Hopper is actually a combination of both hardware and software to make sure that transformer models can train to convergence with only that 8 bits and information at the core computing unit. And it's a ton of work to make that actually consume a massive amount of our own supercomputing capability to make that work, to understand, to figure out how to keep things within the range of those 8 bits.

I mentioned that for sequence lengths, because by doing so, we reduced the size of the model, the size of the working set. It can a bit more in a 96 or 94, 80-gig GPU depending on your flavor with that's available with Hopper. And of course, it keeps the response times, how fast it can respond to a question within a range of usability. After that, we scale. So we scale if we need more GPU compute in order to expand further, we scale with NVLink. So we have technology on the NVLink, which allows. It's on Hopper, it's 900 gigabytes a second, which is a lot. It's roughly 7x. I think more than what you get with PCIe if we try to use the standard PCIe of connecting devices inside the system to basically combine two GPUs together into one. So we'll split the model and actually execute the model in parallel across two GPUs, you need that much bandwidth between the GPUs in order to keep things going, to keep things – to make allowable GPUs to operate as one to split the model and keep the latency response envelope.

If you need more, we can go from two GPUs within MDL with H-100 MDL product which is actually two PCIe cards on the bridge to 8 way. So we have an HGXway system that can go across 8. And then beyond that, we can use their tricks to use InfiniBand. Or we can go all the way to Ranger, our DGX GH200, which is our 256 GPUs all connect, like I just announced that in Computex two weeks ago.

The other thing is size of model. So how can we do bigger models? Even if we don't need the latency or do some smaller models with longer sequence length, but could be served with a single Hopper in terms of performance. For that, we have Grace Hopper. Grace Hopper is our – we've announced that we've been talking about that in our GTC. If you haven't seen our GTC conference, you should check it out. Grace Hopper basically is a 600 gigabyte GPU. So we combine the GPU, which has upwards of 96 gigabytes of HBM memory. And then on CPU, room together with the NVLink, so that the GPU can take advantage of all of the CPU memory, which operates at upwards of 500, 600 gigabytes a second. So we can now have effectively a 600-gigabyte GPU, and that also helps with these doing larger sequence lengths.

There are lots of ways to – and actually, I pity your community to do all this analysis. It's becoming a complex matrix of model size, latency requirements and sequence length. And we're blanketing the space. So we have – and that's why we're creating – ECS creating so many different variants, some different products of Hopper in a PCIe form factor, Hopper in a NVLink HGX form factor, two PCIe bridge together. And now Grace Hopper as all of which can be used to deploy inference at scale.

Hans Mosesmann

Wow, that's a good answer.

Ian Buck

Yes. I apologize, this is what I do every day and making sure that working with each of those hyperscalers and those startups and everyone else to dial it in and create new products to address it.

Hans Mosesmann

And so, it looks like because you're blanketing the market with various types of products, it can counter some of the different proprietary or new architectures that have emerged out there that are being proposed. Is that kind of like what you're saying?

Ian Buck

Yes. There's as multiple – there's not one click anymore in NVIDIA's road map. I think that's kind of how it – it used to be, here's our Pascal GPU. And three years later, here's Volta. And here three years later, there's Ampere, 200. What NVIDIA has been – we've been working on is diversifying the ways in which we can add value. And instead of – now we build CPUs, GPUs, TPUs. We work on InfiniBand and Ethernet, and they both of those platforms AI capable for different paths.

And then we can play with and optimize how we can connect all these things together and build different products up, even within one GPU, traditional generation and meet the demand wherever it wants to go based on where AI is going. So that agility is really important in AI. Things are being invented all the time. And NVIDIA being the sort of one AI company that works with every AI company, that's why you're seeing all these products. It's because we can meet – we're meeting different – seeing different aspects of what people need to be able to dial in and bring to market perhaps at the peril of our partners who have to – now are being – are trying to meet all this demand to meet – to optimize that workload.

The other thing I'll say is that we've also accelerate our GPU roadmap. So we used to do GPU – 100 class GPUs every three years. We're now down to two years or some case 18-month cycle. Jensen has talked about Hopper Next and that timeline. And in addition, it's timed with our – our Grace Next, it's timed with Quantum Next for Interconnect and we've accelerated that now. So we're now doing and able to invest and sure of that now every two years or 18 months depending on how you look at it.

Hans Mosesmann

Wow, that's good to know. We had a bunch of questions that had just come in. We don't have a lot of time. This is a tactical one, I'm not sure if you can answer, maybe Simona can come in. So can you please talk about efforts to source supply for the second half of the year and how does NVIDIA define significant as mentioned in the latest conference call?

Ian Buck

Yes, Simona, you can comment a little bit more on the conference call details and I can follow.

Simona Jankowski

Sure. Happy to do so. I hope you guys can hear me okay.

Ian Buck

Yes.

Simona Jankowski

So we've commented on the earnings call that we are going to have substantially higher supply in the second half relative to the first half of the year. And that essentially backs up the extended demand visibility that we see stretching out a few quarters into year-end as well. As we commented, we have seen a pretty steep increase in demand through the quarter all the way leading up to the current time. And so we're working closely with customers to ensure that we have supply for them. That also helped underpin the strong guidance we gave for the second quarter, and then even with that higher baseline in the second quarter, we commented on substantially higher level of supply second half versus first half. We haven't been more granular on the exact linearity between Q3 and Q4. So just give us a bit of time as we get closer to the back half of the year, we'll be able to provide guidance quarter-by-quarter.

Hans Mosesmann

Okay.

Ian Buck

Yes. That's pretty much – I don't have much more to add to that. We are certainly – our biggest customers, of course, plan with us. And everyone is swarming generative AI, but as part of that, we are able – we are working, of course, doing that planning with them and continue to do that with them as we're doing all the things that Simona mentioned at the same time.

Hans Mosesmann

Okay. Here's another question. Along the same lines, maybe you can answer this. What is the biggest bottleneck for NVIDIA or GPUs more broadly? How much time do you think it'll take for the industry to build a sufficient inventory or supply levels?

Ian Buck

Well, I'm not going to comment on specific bottlenecks from the supply standpoint. I think the challenge or bottleneck perhaps in further adoption, it's not really a bottleneck, it's just where is it going is the broadening. We're seeing now the enterprises pick up AI. And for that to happen, the people are the providers of AI, including NVIDIA, need to meet the enterprises where they are.

In some of them, they have their own AI expertise. They've either through acquisition or hired or brought in-house or working closely with a startup or others to adopt AI to their – to influence or improve their business. And you see that in some of the large language model starts, for example, providing that, really like the work in-depth that AI is doing example, making it easier to use the older software with AI that can click all the buttons and check all the boxes instead of having an existing software. It's a great way of doing those things.

But also meeting the needs of those enterprises where they are in terms of perhaps a service or taking a pre-trained model and fine-tuning it to something useful, where really the only thing enterprise needs to do is provide the right kind of data and convert a pre-trained model into their own virtual system or chat capability. So a lot of the activity right now is about helping them adopt AI into their workflows, into their products, and some of the work we've been doing on the – our own NeMo product, for example, is exactly that, where it's shockingly easy to use. You can provide a few hundred up to maybe a thousand or two thousand example, it text in, text out, and it can fine tune the GPT model all the way up to 175 or larger to answer questions in that format, in that context.

Instead of just asking a generic, ChatGPT question, which gets you a generic answer from a generic human or generic how the Internet would answer it, you can have a – you can answer it like a financial expert or a support call expert or other such things connecting AI with information retrieval systems.

So when you ask a question, you don't just get an answer, which may or may not be right. And certainly, it can – we can all make ChatGPT lie or write code that isn't – that actually looks right, but is something's made up, but actually get to the actual source of information. You see that a little bit with the work that Bing is doing. But more broadly, generative AI is useful, if it can only generate an answer, but tell you where the sources are, so you can further explore their results.

So that democratization and connecting all those GPUs with industry is a big push right now. And you’re starting to see the early movers in that space. That's where the next wave of GPU usage, and also revenue from services and things like our DGX Cloud efforts, as well as our partners is going to – is moving the needle.

Hans Mosesmann

Last question, we got a minute. Let's see if we can keep it to a minute. How have the conversations with biggest clients, hyperscalers or enterprise changed after your last earnings call? Because it seems from what we hear that this was a real wake up call for many decision makers on how to make serious investments on AI. So question is how quickly has this changed since the conference call, which is basically two, three weeks ago?

Ian Buck

I don't know that's changed from the conference call. It's certainly changed in the ChatGPT moment and the generative AI moment. And the converse – it probably is only continues to be amplified with activities they're seeing in the street. But the opportunity for generative AI in every one of their services and every one of their capabilities, seeing NVIDIA not as a supplier of GPUs, which we are a supplier of infrastructure, but a partner at many levels. We always were a partner in their hyperscale efforts, the development of their servers, the design of their data centers, how to build something even more capable and optimized and power-efficient and at scale. Every one of them has their unique challenges and capabilities and their own technology that they can contribute to and working with NVIDIA to make it work well.

Amazon's EFA and elastic adapter, and we've done a lot of work to make sure that, that can work at scale. Other hyperscalers that have their own or should they use InfiniBand, they're working with them to scale out InfiniBand or their Ethernet platforms. We've always been a partner on the data center partner, that's only amplified obviously and continues. Now it's about Grace and Grace Hopper and CPU land and what we can do in that space, which is very exciting.

Step up was the software side. So all the capabilities in the software and the infrastructure and integrating into all the different frameworks and the core capabilities, broadening across all their services, so all their developers and researchers can get access to that infrastructure and meet them, that's only amplified. We always were a partner with them with PyTorch and TensorFlow and JAX and others. That certainly has continued and growing.

What we're seeing now is more and more of their service groups seeing NVIDIA as also a partner to optimize their – for the latest platform or what we have to offer in generative AI. Seeing the opportunity to do more with less, moving workloads that were even still on CPU for using a little bit of AI or simplified AI to using much more intelligent larger models, to improve the quality of service, to have a better interaction with the device, even if it's talking to a hockey puck on your kitchen counter or the cloud. And then – but we're also a partner with them to optimize those models.

That 2.5, 3x that we did with the A100 wasn't just us working in the back office, it was the optimizations we were doing because our customers, our biggest customers were giving us challenges in side-by-side working with them to optimize those workloads, which get obviously reflected in things like benchmarks and elsewhere. So that is amplified. And certainly, we have our engagements with their service teams deploying AI, figuring out how to run – use Hopper, use Hopper at scale, do inference better and more efficiently and move more of the workloads to the GPU infrastructure they have and plan for growth moving forward is a big part of – that definitely has gone up quite a few ticks.

Hans Mosesmann

I can imagine. Well, Ian, thank you so much, very enlightening. It looks like you're going to be hiring another 1,000 software engineers. Hopefully, you don't have to do all the interviews on yourself. But exciting times. Simona, thank you as well. And we look forward to the group session that later this afternoon. Have a great day, and thanks.

Ian Buck

Thank you. And hopefully, see you again in person.

Hans Mosesmann

You bet.

Simona Jankowski

Thank you. Bye-bye.

Ian Buck

Bye, Hans.

Question-and-Answer Session

End of Q&A

Read more current NVDA analysis and news
View all earnings call transcripts

Recommended For You