ML training without the AWS bill shock

GPU prices are all over the place. Some platforms charge 10x others for the same hardware. Here's what ML teams are actually using and what it really costs.

Current GPU prices

The same H100 can cost $2/hr or $4/hr depending on where you rent it. Spot instances are 50-70% cheaper but can disappear mid-training. Here's what things actually cost:

NVIDIA H100 (80GB)
The gold standard for training. HBM3, 3TB/s bandwidth.
Lambda Labs$2.49/hr
RunPod$2.69/hr
AWS$4.00/hr
GCP$3.74/hr
NVIDIA A100 (80GB)
Still the workhorse. Great for fine-tuning and medium runs.
Lambda$1.29/hr
RunPod spot$0.89/hr
Modal$1.10/hr
RTX 4090 (24GB)
Consumer GPU, surprisingly capable for experimentation.
RunPod$0.44/hr
Vast.ai$0.29/hr
๐Ÿ’ก Pro tip

Spot instances are 50-70% cheaper but can be terminated anytime. Use them for fault-tolerant training with checkpointing. On-demand for anything you can't restart.

Where to train

Modal
Fan favorite

The serverless GPU platform ML Twitter loves. Write Python, decorate with @modal.gpu("A100"), it runs in the cloud. Cold starts ~1-2 seconds. $30/month free credits.

"Modal changed how I think about GPU compute. I went from spinning up EC2 instances to just running a Python script." โ€” ML engineer at a Series A startup
A100: $1.10/hr H100: $2.95/hr $30/mo free
Try it โ†’
Lambda Labs
Best H100 price

If you need raw GPU hours at the best price, Lambda is hard to beat. H100s at $2.49/hr are the cheapest we've found for reliable availability. They also sell physical hardware.

H100: $2.49/hr A100: $1.29/hr 8xH100: $19.92/hr

Best for: serious training runs needing hours of uninterrupted compute.

Try it โ†’
RunPod
Cheapest spot

Spot instance marketplace. A100s often under $1/hr. Can be preempted, availability varies. Good for batch jobs where you can checkpoint and resume.

"I run fine-tuning jobs on RunPod spot overnight. If preempted, checkpoint saves and restarts. Full LoRA fine-tune costs about $20." โ€” Open source contributor
A100 spot: $0.89/hr 4090: $0.44/hr
Try it โ†’
Hugging Face
The hub

Not just a model hub anymore. Inference endpoints, AutoTrain for no-code fine-tuning, Spaces for demos. If you're working with open models, you're probably here anyway.

500K+ models AutoTrain: no code Inference: $0.06/hr+
Try it โ†’

Other platforms

Replicate
Pay per second ยท API-first

Run and fine-tune models via API. Great for deploying open models without managing infra.

Try it โ†’
Together AI
Inference + fine-tuning

Fast inference for open models. Often the cheapest way to run Llama, Mixtral, etc.

Try it โ†’
Vast.ai
GPU marketplace

Peer-to-peer GPU rental. Cheapest option, reliability varies. Good for experiments.

Try it โ†’
Anyscale
Ray-based ยท Enterprise

From Ray creators. Good for distributed training at scale. Enterprise-focused.

Try it โ†’

What to use

Experimenting? Modal free tier or RunPod spot. Vast.ai for absolute cheapest.

Fine-tuning? Modal or RunPod. Hugging Face AutoTrain for no-code.

Serious training? Lambda Labs for best H100 prices.

Production inference? Replicate or Together AI APIs. Modal for more control.

Enterprise? AWS/GCP if already there. More expensive but ecosystem benefits.

More from BuiltForAI