This week, guest Brian DeVincentis, M-Star’s Lead Algorithm Developer, joins John to discuss the question, “What’s the big deal about GPU computing?”
“It’s an exciting area of development and it’s going to be neat to see how the hardware ultimately ends up informing the algorithms – and vice versa.”
In this episode, John and Brian discuss:
- The differences between CPU and GPU computing
- What types of algorithms run on GPUs (and what types don’t)
- Hardware and software limitations
What are your thoughts on GPU computing? Listen to the podcast below and contact us to join the conversation!
John Thomas: We’re talking today about GPU computing and some of its benefits as related to computational physics and computational physics modeling. So the real topic in question today we’ll be discussing is: What’s the real, tangible benefit of GPU computing?
To help answer these questions, we have Brian DeVincentis, who is our lead algorithm developer here at M-Star CFD to talk about GPU computing. For the last couple of years, Brian has been helping us port our codes to GPUs and seen some pretty neat speed improvements as a result of that work.
Brian, thanks for being here. Do you want to take a minute and introduce yourself?
Brian DeVincentis: Hi, John. Yeah, thank you. My name is Brian DeVincentis. I work at M-Star, and I work a lot on the software development and GPU computing for our physics software.
John Thomas: So, Brian, tell us how you got to where you are today as an expert in GPU-based computing. Tell us where you started, some of your educational trajectory and how you picked up some of the skills you’re using today.
Brian DeVincentis: Yeah. So I started with computer programing and physics back in high school, and it’s really when I started getting interested in it. And then as I went to college, I majored in mechanical engineering at Carnegie Mellon. And all the same time, while I was learning about physics and engineering and transport physics during my undergraduate degree, I was also constantly learning about and improving my skills at computer programing as well. And so towards the end of my degree there, I combined these two interests of mine and started to go in a direction that combined physics and engineering with computer programing. And so I kind of went into this area of numerical methods and numerical simulations of physical processes. And it was at that time where I really started getting some experience in with CPU programming, and it was actually lucky—it was the time at which the GPUs were really starting to catch up and beat the performance of CPU-based computing.
John Thomas: Mm hmm. This would have been what the early twenty teens-ish?
Brian DeVincentis: Mm hmm. Yeah, kind of mid-2010. Yeah.
John Thomas: Right on. And now you spent some time in Texas working in an HPC center, right?
Brian DeVincentis: I did. That was really one of the first, I guess that was my first work experience and research experience doing this sort of numeric molecular dynamics. Not GPU computing, but I was kind of still learning some of the numerical methods behind molecular dynamics and running simulations and that sort of thing.
John Thomas: Right on. And I remember you told me what a senior design project you did where I think you used lattice-Boltzmann algorithms to model a swimmer or fish or some such?
Brian DeVincentis: Oh, that’s right. Yes. So that was my first lattice-Boltzmann project. We did a 2D simulation of different swimming motions using, like I said, lattice-Boltzmann. And that was, you know, that was kind of the first fluid simulation that I had programmed before.
John Thomas: Right, right. And that was CPU stuff when you’re coming out of college, right?
Brian DeVincentis: Mm hmm. Yes.
John Thomas: And so you mentioned that, you know, GPUs kind of came on the scene maybe eight, ten years or so ago, you know, so just for general knowledge, what is a GPU? I know what a CPU is. I have that my computer, you know? You buy that from Intel. What is a GPU and how is it different?
Brian DeVincentis: Yes, so GPUs, they’re a significantly different piece of hardware than a CPU. So CPUs you can think of as sequential processors. And what they do is they take instruction by instruction and they execute them one at a time.
GPUs, on the other hand, what they do is they take an instruction and execute that on a whole large set of data. And so it does computer processing in parallel. And so that’s kind of structurally the difference between a CPU and GPU.
John Thomas: So, so a CPU does one thing at a time. A GPU does many things in parallel – like tens, hundreds, thousands of things in parallel?
Brian DeVincentis: Yeah, exactly. So, you know, your CPU might have some vector capabilities, you know, on the order of maybe a dozen or so, or dozens, whereas a GPU can perform its calculations on thousands of compute cores at one time.
John Thomas: Got it. Is this similar to running in a multi-CPU environment or somehow better?
Brian DeVincentis: It’s similar, but the chips – so a GPU chip has been specifically designed for this purpose. It’s been designed to do many, many very simple calculations, very efficiently, whereas CPUs have been designed to do fewer calculations at once, but maybe do more complex calculations in a more efficient way.
John Thomas: So a GPU was really good at doing the same simple calculation over and over and over again on a data set.
Brian DeVincentis: Exactly.
John Thomas: So Brian, let’s talk brass tacks here. You know, how fast is the GPU relative to a CPU? If I have a good, you know, state-of-the-art scientific GPU on my desktop computer, what type of performance increase could I realize relative to CPUs on my computer?
Brian DeVincentis: Sure. So it is a tough comparison to make. Probably the best way to do it is to look at cost. Cost is probably what I would look at. And what you can find, and again, it depends like we were talking earlier, it depends on the exact algorithm you’re talking about and how well it fits in the GPU, right? But if your problem fits well on the GPU, you can be looking at one, two even a little bit more than that performance improvement with the GPU relative to the CPU, you know, spending approximately the same amount of money.
John Thomas: You mean one to two order of magnitude?
Brian DeVincentis: Orders of magnitude. Sorry. Yeah.
John Thomas: So it’s that’s significant. I know what our software, you know, a good scientific GPU, which might cost five or eight thousand dollars, can run with the same speed as probably a maybe 256 or 512 CPU HPC. And the good GPU, as you mentioned, seven thousand bucks in HPC of that size would be several hundred thousand dollars.
And I think that asymmetry is worth noting and I think why GPUs are getting so much attention in the scientific computing community.
Brian DeVincentis: Yeah. And one thing I’ll add to that is that performance improvement, I think you can break it down into two different components.
Part of it is that it just can process more per buck, right? So like the memory bandwidth, the number of flops that your chip you can do compared to the CPU per dollar is just higher. And that’s part of it.
The other part of it is that all your computation is happening closer together. It’s all happening on one chip. And so the memory and the sinking of information that needs to happen in the communication between your different computational cores and the moving around of memory is all a lot faster because it’s on the same piece of silicon. It doesn’t have to be communicated between this whole complex system of servers and switches and that sort of thing.
John Thomas: Yeah, and that communication latency goes to, well, approaches zero, relative to what you’d have to realize in a 512 CPU HPC environment.
Brian DeVincentis: Exactly, yes.
John Thomas: So with that in mind, you know what types of algorithms run well on GPUs? I imagine that because they are, I’d say, scoped and tailored to handle comparatively simple operations, does that limit the breadth of applications being run efficiently in a GPU environment?
Brian DeVincentis: Yeah. So the types of algorithms have to fit your hardware. So if you have a very sequential algorithm where each step has to depend on the last step, then you’re looking at a CPU algorithm. If you have a whole bunch of computational work that can be done at one time and the algorithm is being applied, it’s the same algorithm being applied to every piece of data that you have, then that’s where the GPUs come in.
John Thomas: Right, right. So I’ve seen application obviously, the lattice-Boltzmann we apply to solving the transient Navier Stokes equations. So, OK, so GPUs have this neat capability to handle a lot of information at once. And outside of scientific computing, I know they’re applied to things like machine learning, other sorts of artificial intelligence. Do you think this pivot towards massively paralyzed but simple hardware is a part of a broader megatrend here? Or do you think this is something of a one-off technology? Do you think in the next ten years we’ll be doing more of this type of computing? Or do you think that CPUs will be modified in a way that supplants the need for this kind of unique type of GPU architecture?
Brian DeVincentis: Well, my guess would be that, you know, every different piece of hardware is going to find its niches. You know, so yeah, up ‘till now, CPUs have been the only form of computing that have been available. You’re either computing on a CPU or not really at all.
GPUs are taking a chunk of that away and they’re going to be doing, they’re taking over, they’re going to be dominant in certain applications, right? And I think as time goes on, other forms of computing will arise and, you know, find their niches and chip away from CPU or GPU computing as well.
John Thomas: That’s a good perspective. So that being said, just so we understand, you mentioned serial computing being very useful for CPU-based architectures. Are there other hard limits on GPU computing, things that we should be aware of or things that we know just don’t run well on the GPU?
Brian DeVincentis: One point I would make is that while some algorithms can be programed to fit on a GPU, it is also harder to do so. So CPU algorithms are generally easier to program. With the GPU environment, you have a lot more to think about as you’re writing your code. That may be due to the parallelism and you have all these different calculations going on at once. And so you have to think about synchronization between different parts of your code that are operating at the same time, and you have to think about memory management in a much more detailed way than you would with a CPU environment.
And so really, I think a lot of the – beyond just the hardware – there’s also the software limitations as well. And you know that those can be overcome over time, possibly by developing better libraries, better software guidelines and that sort of thing.
John Thomas: So, it sounds to me like GPUs are great. They have a lot of functionality, but some software just might not run well on them. That is porting a CPU-based code to a GPU won’t automatically give you this 100 x speed improvement. You’ve got to have an algorithm in a framework that is amenable to the architecture in order to realize what the performance speeds these GPUs can promise.
Brian DeVincentis: Exactly. Or it may just take a lot of work to make those changes. It may take years to rewrite or rewrite large parts of software to make them work well on the GPU. And so really, over the past, you know, handful of years, we’ve had, you know, huge improvements in the GPU hardware and the I don’t think the software has kept up. It’s going to be years before the software is fully taking advantage of the chip hardware that’s out right now.
Yeah, one good example of that is reading and writing data to and from your disk. Right now, if you’re, for example and for M-Star, our simulations produce a lot of data output and write it all to the disk, right? And post-processing all that data, a lot of the bottleneck is actually reading and writing the data from the disk and then doing subsequent operations on that.
And so it’s actually something NVIDIA is working on right now is being able to move data directly from your disk to your CPU and back and not having to go through your RAM and your CPU on the way.
John Thomas: Interesting. That’d be a pretty important pivot in terms of our data management and data shuffling.
So let’s say I’m new to GPU computing. Where would I go to get started? Do you have a favorite guidebook you like to reference? Do you have a favorite kind of go to source you go to or to get familiar with how these things work and what sort of algorithms we might write for GPU?
Brian DeVincentis: Yeah, there’s a handful of different programing platforms in which you can get started. There’s quite a bit of support for Python programing on GPUs. That might be kind of the easiest place to start. A lot of people are familiar with Python programing. And there’s some libraries. Numba is one, Pikuta. Some of these libraries make it pretty easy to get up and running with some basic, GPU programing.
John Thomas: That’s great at something worth looking up to get a leg up.
Brian, this was great. I appreciate your time and discussing some of these points with us today. Like I was saying, it’s an exciting area of development, these new architectures and new hardware, so it’s going to be neat to see how the hardware ultimately ends up informing the algorithms and vice versa. So thanks again for being here to kind of talk us through some of these interesting points.
Brian DeVincentis: Yeah, yeah. No problem. Thank you.