Azure GPU instances give you access to NVIDIA A100, H100, and H200 hardware on demand. Navigating three VM families, a quota system that blocks you by default, and per-hour costs that can exceed $12 per GPU takes real effort. This guide covers everything developers and ML engineers need to make the right call.
Key takeaways:
- Azure GPU VMs fall into three series: ND (large-scale AI training), NC (general ML and inference), and NV (visualization and VDI)
- The cheapest single H100 Azure offers is the NCads H100 v5 at roughly $6.98/hr on-demand
- Spot pricing can cut costs by up to 80%, but Azure gives only 30 seconds of eviction notice
- For smaller-scale training and inference, specialized GPU clouds can be 3 to 5 times cheaper with no quota overhead
For a side-by-side comparison of costs across more than a dozen providers, see Thunder Compute's guide to the cheapest cloud GPU providers.
What Azure GPU Instances Are (and When You Need One)
A GPU instance is a cloud VM with one or more Graphics Processing Units attached. Unlike CPUs, which handle a small number of complex tasks in sequence, GPUs run thousands of simpler operations in parallel. That makes them the right tool for model training, fine-tuning, batch inference, and graphics-intensive workloads.
Azure groups all its GPU VMs under the N-family. The letter after the N indicates the purpose: ND for deep learning and AI training, NC for general compute and ML, NV for visualization and remote desktops. Within each family there are multiple generations, each tied to specific GPU hardware.
A GPU instance earns its cost when your workload hits the ceiling of what a CPU can do efficiently. Training a transformer model, running inference on a 70B-parameter LLM, or rendering 3D scenes on the right GPU instance reduces time significantly.
How to Read an Azure GPU VM Name
Azure VM names look cryptic at first. Breaking one down removes most of the confusion.
Take Standard_ND96isr_H100_v5 from left to right:
Standardis the pricing tier,NDis the VM family96is the vCPU countimeans InfiniBand is included,smeans premium storage is supportedrmeans RDMA is availableH100is the GPU modelv5is the generation.
Newer SKU names like this one mention the GPU model explicitly, making the current lineup easier to parse than older generations.
The NV family follows the same pattern. Standard_NV36ads_A10_v5 = NV-family VM with 36 vCPUs, AMD CPU (a), local temp disk (d), premium storage (s), powered by a full A10 GPU, generation 5.
Azure GPU Instance Series: ND, NC, and NV Explained
ND Series: Purpose-Built for AI Training
The ND series is Azure's highest-performance GPU family, designed for large-scale distributed training. Current-generation ND instances use NVIDIA H100 SXM5 GPUs connected by NVLink 4 and backed by 400 Gb/s NVIDIA Quantum-2 CX7 InfiniBand per GPU (3.2 Tbps per VM) for inter-node communication. The flagship SKU, ND96isr_H100_v5, packs 8 H100 SXM5 GPUs, 96 vCPUs, and 1.9 TB of RAM into a single instance.
One hard constraint is that there is no single-GPU ND instance. The minimum configuration is eight GPUs. The ND family also carries Azure's newest hardware. ND H200 v5 instances previewed in late 2025, and select GB200 configurations have been announced for 2026.
NC Series: General ML Training and Applied AI
The NC series covers compute-intensive ML workloads that do not need InfiniBand or full NVLink configurations. The two active NC generations are:
- NC A100 v4 (NVIDIA A100 PCIe 80 GB)
- NCads H100 v5 (single NVIDIA H100 NVL GPU with 94 GB of HBM3 memory).
The NCads H100 v5 is Azure's only single-GPU H100 offering at roughly $6.98/hr on-demand. It lacks the SXM interconnect of the ND series, making it unsuitable for multi-node distributed training. Regardless, it handles single-node fine-tuning, batch inference, and applied AI workloads well.
NV Series: Visualization, Inference, and VDI
The NV series targets graphics rendering, virtual desktop infrastructure (VDI), and lighter AI inference workloads. The current generation, NVads A10 v5, offers fractional configurations from 1/6 of a GPU (4 GB VRAM) up to a full A10 (24 GB VRAM). Each NV instance includes an NVIDIA GRID license, enabling virtual workstation and multi-user remote desktop scenarios.
Two older NV generations are being retired on September 30, 2026: NVv3, and NVv4.
For pure ML inference, the NV series is cost-effective but VRAM-constrained. A full A10 gives you 24 GB, which handles 7B to 13B models in FP16 or larger models at 4-bit quantization. If your inference workload needs 48 GB or more, the NC series or a specialized GPU cloud is a better fit.
Quick-Pick: Which Azure Series Fits Your Workload?
| Workload | Recommended Series | Instance | GPU |
|---|---|---|---|
| Large-scale LLM training (70B+ parameters) | ND | ND96isr H100 v5 | 8x H100 SXM5 80 GB |
| Single-GPU fine-tuning and applied AI | NC | NCads H100 v5 | 1x H100 NVL 94 GB |
| Batch inference and smaller training runs | NC | NC24ads A100 v4 | 1x A100 PCIe 80 GB |
| Visualization, VDI, and remote desktops | NV | NVads A10 v5 | 1x A10 (6-24 GB) |
| Budget experimentation and fault-tolerant training | NC (spot) | NC24ads A100 v4 spot | 1x A100 PCIe 80 GB |
Azure GPU Pricing in 2026
On-Demand Pricing by GPU Type
Azure bills GPU VMs by the hour under the pay-as-you-go model, with no minimum commitment. Prices vary by region and instance type. The figures below are East US on-demand rates as of July 2026.
| Instance | GPU | GPU VRAM | Azure On-Demand ($/hr) | Best For |
|---|---|---|---|---|
| NV36ads A10 v5 | 1x NVIDIA A10 | 24 GB | $3.20 | Inference, VDI, visualization |
| NC24ads A100 v4 | 1x NVIDIA A100 PCIe | 80 GB | $4.41 | Single-GPU ML training and inference |
| NC40ads H100 v5 | 1x NVIDIA H100 NVL | 94 GB | $8.30 | Applied AI training and batch inference |
| ND96isr H100 v5 | 8x NVIDIA H100 SXM5 | 640 GB | $98.32 | Large-scale distributed training |
| East US on-demand pricing as of July 2026. Prices vary by region and change frequently. Verify on the Azure pricing page before budgeting. | ||||
Reserved Instance Pricing: Committing for a Discount
Azure offers significant discounts in exchange for 1-year or 3-year commitments to specific VM sizes and regions. Using the ND H100 v5 as an example:
| Pricing Model | $/GPU/hr (ND H100 v5, East US) | Discount vs On-Demand |
|---|---|---|
| On-demand (pay-as-you-go) | ~$12.29 | Baseline |
| 1-year reserved | ~$7.93 | ~35% |
| 3-year reserved | ~$5.47 | ~55% |
| Reserved pricing requires committing to a specific region and instance type. Spot pricing varies by availability and can change with 30 seconds notice. | ||
Reserved instances make sense for predictable, sustained GPU workloads where you can commit upfront. The 3-year rate brings Azure closer to specialized cloud on-demand pricing, but locks you into a specific region and instance type for an extended period.
Spot Instance Pricing: The 80% Discount with a Catch
Azure Spot VMs let you run GPU instances on unused capacity at 70 to 82% below on-demand rates. Azure can reclaim a spot instance with only 30 seconds of notice, which is half the warning AWS provides (2 minutes). Checkpoint frequency is critical: for large jobs, aim for intervals no longer than 5 to 10 minutes.
Spot works well for fault-tolerant training jobs where your script can checkpoint and resume automatically. It is not suitable for production inference, interactive workloads, or any job where interruption causes data loss.
Regional Price Variations
Azure GPU pricing varies significantly across regions. East US is consistently the cheapest US location. West US 2 runs roughly 7% higher for H100 instances, Europe West adds about 19%, Southeast Asia around 30%, and Australia East around 33%.
For non-latency-critical workloads like model training on private data, East US will save meaningful money over APAC or European regions.
The Hidden Costs of Azure GPU Instances
The advertised hourly rate is the floor, not the ceiling, as several additional costs stack on top of GPU compute.
Storage. Azure GPU VMs include local NVMe temp storage, but it is wiped on deallocation. Persistent storage for datasets, checkpoints, and model artifacts requires Azure Blob Storage (roughly $18 per TB per month for hot tier, LRS redundancy in East US) or Managed Disks, billed even when the VM is stopped.
Egress. Azure charges for data transferred out of its network. Downloading model weights to a local machine, serving inference results to external applications, or moving artifacts between regions all incur egress fees. A 100 GB model weight download costs roughly $9 at standard egress rates.
Azure ML surcharges. Using Azure Machine Learning as a managed layer on top of raw VM instances adds approximately 25% to compute costs.
Stopped vs. deallocated VMs. Stopping a VM via the OS or portal without selecting "Deallocate" still charges for compute. Only deallocating the VM stops compute billing. Storage charges continue in both states. Build your shutdown scripts to explicitly deallocate: az vm deallocate --resource-group my-gpu-rg --name my-h100-vm, and set Azure Cost Management alerts to catch runaway spend.
Idle GPU time. Hourly billing continues whether your GPU is training or sitting idle between runs. Teams without auto-scaling or job scheduling often find a large share of their Azure GPU bill covers idle capacity.
Azure GPU vs AWS, GCP, and Specialized Providers
Azure is among the most expensive of the three major hyperscalers for the GPUs that matter most to ML workloads. The table below shows on-demand, single-GPU pricing across providers for A100 and H100, the two GPUs Azure actually offers for training and inference.
| Provider | A100 80 GB ($/hr) | H100 80 GB ($/hr) |
|---|---|---|
| Thunder Compute | $1.09 | $2.19 |
| RunPod | $1.39 | $2.89 |
| Crusoe Cloud | $2.00 | $3.90 |
| Lambda | $2.79 | $3.99 |
| AWS | $3.43 (p4de.24xlarge) | $6.88 (p5.48xlarge) |
| Azure | $4.41 (NC24ads A100 v4) | $8.30 (NC40ads H100 v5) |
| Google Cloud | $5.07 (a2-ultragpu-1g) | $11.06 (a3-highgpu-1g) |
| On-demand, single-GPU pricing as of July 2026. AWS A100 is per-GPU from the p4de.24xlarge (8-GPU node). GCP H100 is per-GPU from the a3-highgpu-1g. Prices subject to change — verify with each provider before budgeting. | ||
Thunder's on-demand H100 rate of $2.19/hr is lower than Azure's best-case spot price (roughly $2.25 to $3.69/GPU/hr), without any eviction risk or upfront commitment. There are no egress fees, no managed service surcharges, and no Blob Storage charges layered on top.
Read a direct comparison of Azure NC A100 v4 vs Thunder Compute.
How to Access Azure GPU Instances (and Why It Is Harder Than It Looks)
The GPU Quota Problem
Every Azure subscription starts with a vCPU quota of zero for GPU VM series in every region. You cannot launch a single GPU instance until you submit a quota increase request and wait for approval. For popular series like ND H100 v5, approval can take anywhere from a days to weeks depending on the region and requested quota size.
Spinning Up a GPU VM with Azure CLI
Once quota is approved, the Azure CLI is the fastest way to launch a GPU instance. The following command creates an NCads H100 v5 VM with Ubuntu 22.04:
az group create --name my-gpu-rg --location eastus
az vm create \
--resource-group my-gpu-rg \
--name my-h100-vm \
--image Ubuntu2204 \
--size Standard_NC40ads_H100_v5 \
--admin-username azureuser \
--generate-ssh-keys
After creation, install the NVIDIA drivers using Azure's GPU driver extension:
az vm extension set \
--resource-group my-gpu-rg \
--vm-name my-h100-vm \
--name NvidiaGpuDriverLinux \
--publisher Microsoft.HpcCompute \
--version 1.9
A Faster Path: Thunder Compute with VS Code or Cursor
If you want to skip resource groups, quota requests, and driver extensions, the Thunder Compute extension for VS Code and Cursor lets you launch a GPU instance and connect directly from your editor in under a minute. There are no quotas to request, no regions to select upfront, and GPU instances are available on demand.
Azure GPU Cost Optimization
Spot vs Reserved vs On-Demand: Choosing the Right Tier
On-demand pricing suits short, unpredictable, or in-development workloads. You pay the highest rate but have no commitment and can stop at any time.
Spot is ideal for fault-tolerant training jobs with frequent checkpointing. The 70-82% discount is substantial, but your training script must handle interruptions gracefully.
Reserved instances make sense for production inference serving or long-running training programs where you can project GPU-hour consumption 12 months out. The 1-year commitment delivers roughly 35% savings. Azure Compute Savings Plans are an alternative: you commit to a fixed hourly spend rather than a specific VM size, giving more flexibility to shift between instance types.
Right-Size Your Instances
The ND H100 v5 is built for distributed training across 8 GPUs. It is the wrong choice for running inference on a 13B model. Using ND-series pricing for single-model inference wastes most of what you are paying for. Match the instance to the workload: NC for single-GPU training and applied AI, NV for inference at smaller model sizes, ND only when you genuinely need multi-GPU scale.
Keep Data in the Same Region as Your Compute
Transferring data between Azure regions, or from Azure to on-premise storage, adds egress costs that grow with dataset size. Keep training datasets, model checkpoints, and inference artifacts in the same Azure region as your GPU VMs. Use Cool storage for checkpoints you access infrequently and Archive for compliance retention, where retrieval latency is acceptable.
When Azure GPU Instances Make Sense (and When They Do Not)
Azure GPU compute makes the most sense for organizations embedded in the Microsoft ecosystem: teams using Active Directory, Azure DevOps, or Azure ML, and workloads requiring compliance with GDPR, HIPAA, FedRAMP High, or ISO 27001. For large enterprises where GPU compute is one line item in a broader Azure contract, negotiated pricing can close some of the gap with specialized providers.
Azure is a harder choice for independent developers, research teams, and startups where GPU compute is the primary cost. On-demand rates 3-5x higher than specialized clouds, a quota system that adds days to first-launch timelines, and compounding hidden costs make Azure expensive in ways that are easy to underestimate before your first bill arrives.
For large-scale distributed training requiring NVLink, InfiniBand, and the latest NVIDIA silicon within an enterprise-grade environment, the ND series is competitive. For everything else, it is worth pricing out the alternatives before committing.
Last Thoughts on Azure GPU Instances
Azure's ND and NC series give enterprise teams access to powerful GPU hardware with the compliance, security, and ecosystem integration that hyperscalers do best. For developers and ML engineers who need GPU compute without the overhead, specialized providers offer the same NVIDIA hardware at 3 to 5 times lower cost, with no quota delays and no hidden fees.
For a broader look at how these specialized GPU clouds are reshaping the market, see Thunder Compute's guide to neoclouds.
Frequently Asked Questions
What Are the Azure GPU Instance Types?
Azure GPU VMs fall into three series under the N-family: ND (deep learning and AI training with A100, H100, H200, and GB200 GPUs), NC (general ML training and applied AI with A100 and H100), and NV (visualization, VDI, and lighter inference with A10 and AMD Radeon GPUs). Each series has multiple generations and size configurations.
How Much Does an Azure H100 GPU Instance Cost Per Hour?
The cheapest single-GPU H100 on Azure is the NCads H100 v5 at roughly $8.30/hr on-demand in East US. The 8-GPU ND96isr H100 v5 costs approximately $98.32/hr (around $12.29/GPU). Spot pricing can reduce costs by 70 to 82%, with 30 seconds of eviction notice. One-year reserved instances reduce on-demand rates by about 35%.
What Is the Difference Between Azure NC, ND, and NV Series?
NC-series VMs handle general ML training and applied AI using NVIDIA A100 and H100 PCIe GPUs. ND-series VMs are purpose-built for large-scale distributed training, featuring NVIDIA A100, H100, and H200 SXM GPUs with NVLink and InfiniBand. NV-series VMs target graphics rendering, virtual desktops, and visualization using NVIDIA A10 and AMD Radeon GPUs with GRID licensing included.
How Do I Get GPU Quota on Azure?
Every Azure subscription starts with zero vCPU quota for GPU VM series per region. Navigate to "Quotas" in the Azure portal, select "Compute," filter by the VM series you need, and submit an increase request with a business justification. Common series may be approved in a few days; ND H100 v5 quota in popular regions can take one to four weeks.
Can I Use Azure Spot Instances for GPU Training?
Yes. Azure Spot VMs for GPU instances offer 70 to 82% discounts below on-demand rates. The trade-off is a 30-second eviction notice, shorter than the 2-minute warning AWS provides. Spot suits training jobs that checkpoint progress frequently and is not appropriate for production inference or interactive workloads.
Is Azure Cheaper Than AWS or GCP for GPU Instances?
No. Azure is the most expensive of the three major hyperscalers for on-demand A100 and H100 access. AWS offers A100 at $3.43/GPU/hr and H100 at $6.88/GPU/hr; GCP's H100 runs $11.06/hr on-demand but applies automatic sustained-use discounts. Azure's A100 is $4.41/hr and H100 is $8.30/hr. Azure's advantage over AWS is single-GPU A100 access; AWS requires a minimum 8-GPU configuration.
What Is the Difference Between a Stopped and Deallocated Azure VM?
Stopping an Azure VM without deallocating it still charges for compute. Only deallocating the VM stops compute billing. Storage charges apply in both states. Use az vm deallocate in shutdown scripts rather than a simple OS-level stop, and set Azure Cost Management alerts to catch unexpected spend.
How Does Azure GPU Pricing Compare to Specialized GPU Clouds?
Specialized GPU clouds are substantially cheaper for on-demand access. Thunder Compute's A100 80 GB starts at $1.09/hr versus Azure's $4.41/hr, and its H100 PCIe starts at $2.19/hr versus Azure's $8.30/hr. Thunder's on-demand H100 rate is lower than Azure's best-case spot price, with no eviction risk, no egress fees, and no quota requirements.