Hello and welcome to the wild world of fast Fourier transforms and the cuFFT library, specifically its performance and features. Let's talk about fast Fourier transforms. They were discovered pretty much by Gauss but formalized into a paper by Tukey in 1965. If you look at the diagram on the left, that signal on the left, when it's transformed into the frequency domain via a forward transform, it can then be decomposed into a set of signals in the frequency domain that are then combined to then be inversely transformed back in to the time domain so that you solve this problem. Why should you care about this? Well, first off, unless you're really good at math, calculus is pretty hard, and from the perspective of the time to solve these problems, generally, they work to the order of n squared, which is not necessarily a good thing to have. You want polynomial time. When you use FFTs, it transforms the problem into O of n log n, which is much faster. That might seem minor, but it really does make a difference when you're considering things like continuous video, continuous audio, and you want to make those calculations in line, no major delays between the time a signal comes in and when it goes out. This often happens in high-quality hardware. Why do we care about doing this on a GPU versus a CPU? Supposedly NVIDIA says that it is 10 times faster to do these computations using the cuFFT library versus the CPU implementation, but let's just think of the problem of a single signal that needs to then be decomposed and calculated into a series of signals. What is the main reason you use GPUs? Well, you decompose large problems into smaller problems so it seems like a good idea. Another thing about GPUs that are really good for this, especially if speed is your concern, is that GPUs, because they're built for video cards, and what a video card too rapidly render scenes, video, audio, things like that. It's perfect for dealing with continuous data. It's a perfect fit. What are some of the features of cuFFT? Well, first is it's packaged along with other libraries in the NVIDIA developer toolkit that comes along CUDA. It handles multi-dimensional transforms, both from single, which are fairly easy to very complex n-dimensional transforms. Now of course, the higher the number of dimensions, the more complex it is, the slower it is so there is a relationship between the dimensions and the speed of computation. More recently it started to support multiple GPUs so this is really helpful if you have a cluster or if you set up a rig that is for video processing and you need to perform these transformations. The transforms it handles are real and complex and between them, and that's very important because not all of the computation is in the real domain. Then it has three major processing modes. The first being batch so this would work under the circumstance where you're not real-time processing data, but you want to process more efficiently. A lot of data streaming is very helpful when you're dealing with trying to as in close to near time process data, and there are many many use cases for that and the last is asynchronous processing, which means if you have a bunch of signals that you're receiving from say, a file or from some network call, you can send them out and as they're processed, you'll get them back so it's between batch and streaming.