If you’re reading this blog, then chances are you’re a developer who has spent more time wrestling with GPU infrastructure than you’d care to admit. You’ve likely experienced the ever-shifting landscape of serverless GPU offerings, where the stakes are high—really high. For instance, renting a single A100 instance on Google Cloud can set you back $32,172 a year, more than double the federal minimum wage of $15,080. And let’s face it, a cloud provider that doesn’t scale with your project can become a huge PITA that leads to hours of config. That’s why choosing the right GPU cloud provider at the beginning of your project is crucial. To save you the headache of choosing poorly, let’s take a look at some benefits of different solutions.
Your wallet would love it if your project consistently used 95% of the same GPU all day every day. Unfortunately, the reality is that workloads are unpredictable, like inference servers that sit idle for hours—driving up your bill—before needing a sudden burst of compute. Not to mention, your GPUs continue to rack up cost while you debug—few things are more stressful than paying for every second you spend adding print statements to your code. Luckily for serverless GPU users, providers provision GPUs on demand during program execution, reducing idle time and thus saving you money.
While you may waste money by optimistically assuming that your new app will launch with 100s of users, your app is likely to break if you don’t correctly predict the compute needs of the next 1000 users. For unpredictable workloads, rather than spending time developing complicated demand forecasting algorithms, developers can rely on serverless GPUs to automatically scale compute alongside a project.
As you may know all too well, setting up a GPU cloud instance is like navigating a maze of quota barriers, driver issues, and capacity shortages. Ideally, you would set up a single instance and use it for your entire project, but in practice, a project's GPU needs often change over time. Often enough, you end up wasting precious hours thinking about infrastructure when you could’ve been building your project. Serverless GPU cloud providers abstract away many of the pain points of this setup. This leaves you with more time to focus on what really matters—convincing your boss that 60% accuracy is high enough.
The consistency of a traditional GPU instance is hard to beat. As a developer, you can SSH into a dedicated cloud instance, set up your environment to match your needs, and code as if you had a local GPU. Moreover, this instance has instant access to a GPU without serverless cold start delays, and is guaranteed to be online for applications in which reliability is essential—military, healthcare, security, etc.
The development environment on a dedicated instance can be configured to precisely match the needs of almost any project. For example, you can match the hardware, drivers, and operating system of your end-state deployment. Additionally, traditional cloud instances provide 100% certainty that code is running in a specific environment, hardware, and data center. This level of customization provides flexibility to adapt to almost any production environment. In contrast, code developed on serverless platforms is often optimized for deployment within the same provider. This limits your flexibility when a project scales beyond what the no-name cloud provider offering H100s for $1 per hour can handle.
All else being equal, traditional GPU instances are cheaper than serverless GPUs. While serverless providers save cost during times when GPUs are idle, this is irrelevant when workloads are consistent. Therefore, steady, long-running workloads are often best suited for traditional instances.
Serverless GPUs and traditional cloud GPU instances each have good reasons to exist. Currently, serverless GPU providers prioritize flexibility, scaling, and simplicity, whereas consistency, customization, and cost are hallmarks of a traditional GPU cloud environment. Sometimes, a project may require the customization of a traditional instance while benefitting from the flexibility of serverless GPUs. This brings the question, "is there a third option?"
In 2024, Thunder Compute launched a new type of hybrid cloud instance which are standard CPU-only instances that have access to virtual GPUs over a TCP network. These instances provide both the consistent environment, customization, and cost of a traditional GPU instance along with the flexibility, efficiency, and simplicity benefits of serverless GPUs. If you still remember this blog after your 3 year commitment at AWS ends, try Thunder Compute.⚡