How much memory do I actually need for local AI coding?

24GB unified memory on a Mac, or 12-16GB of VRAM on a PC, is the realistic floor. Budget roughly 0.6GB per billion parameters at Q4: a 7B model needs ~4-6GB, a 13B ~8-10GB, a 70B ~38-48GB. A 16GB-RAM laptop is not enough once the OS takes its cut.

Is the RTX 4090 laptop GPU the same as the desktop 4090?

No, and this is the most common buying mistake. The RTX 4090 laptop GPU has 16GB of GDDR6 on a smaller AD103 die (9,728 CUDA cores), versus the desktop 4090's 24GB and 16,384 cores. For local models, the 16GB cap is what matters — it holds roughly the same model size as a 24GB Mac despite costing more.

Can any laptop run a 70B coding model locally?

Only a Mac with 48-64GB of unified memory. A 70B at Q4 needs about 39GB, which is more than the 16GB any consumer laptop GPU holds. An M4 Max runs a 70B entirely in memory at roughly 15-25 tok/s and only draws ~80W doing it.

Should I even buy a laptop for this, or use the cloud?

If you'll run the model mostly at a desk, a desktop GPU or mini-PC is cheaper per GB of VRAM and runs cooler. If you just want a capable assistant, a cloud subscription beats local on quality-per-dollar. Local-on-laptop makes sense for privacy, offline work, or unmetered tinkering.

Best Laptops to Run Local AI Coding Models (2026)

As an Amazon Associate, StackBrief earns from qualifying purchases.

Most "best laptop for AI" lists rank machines by GPU speed. For running local AI coding models, that's the wrong spec. The thing that decides whether a model runs on your laptop at all is memory — how much of it you have, and whether the GPU can reach it. Speed only matters between two laptops that already have enough.

So before the picks, the one rule that does all the work: budget roughly 0.6GB per billion parameters at Q4_K_M, plus a little context headroom. A 7B model wants ~4-6GB, a 13B ~8-10GB, and a 70B a hefty ~38-48GB. Hold that number in your head and most of the marketing noise falls away.

And the contrarian bit up front: for a lot of vibe coders, the honest answer is don't buy a laptop for this at all. A cloud plan beats local on quality-per-dollar, and a desktop beats a laptop on VRAM-per-dollar. Local-on-laptop earns its place for privacy, offline work, or no-metering tinkering — not because it's cheaper or better. If a desk is where you'll actually sit, read the desktop-GPU companion guide first.

With that said, here are the laptops worth it — ranked by how well they serve a real local-coding workflow through Ollama or LM Studio.

The sweet spot: MacBook Pro M4 Pro (24GB)

If you want one recommendation, this is it. Apple's unified memory is shared between CPU and GPU, so the model loads once with no PCIe copy step — the GPU reads weights from the same address space the CPU uses. That's why a 24GB Mac punches above a 16GB discrete GPU on model capacity.

Twenty-four gigs comfortably runs a 14-20B coding model — think Qwen3 Coder 30B-A3B (a MoE that lands around 17GB at Q4) — at roughly 30-35 tok/s, which is genuinely interactive. And unlike the fanless Air, the Pro has actual fans, so it sustains that pace through a long session instead of throttling out after twenty minutes.

MacBook Pro 14-inch M4 Pro (24GB) runs about $1,999-$2,399. With Apple now on the M5 generation, M4 configs are frequently discounted — which is exactly what makes this the value-per-capability pick of the bunch. Weigh an M4-on-sale against a current M5 before you buy.

Who it's for: anyone who wants the best balance of model size, speed, and sustained performance without going full workstation.

The only laptop that runs 70B: MacBook Pro M4 Max (48-64GB)

This is the one class of laptop that runs 70B-class coding models entirely in memory — something no consumer laptop GPU can do. A 70B at Q4 needs ~39GB; the most VRAM any laptop GPU holds is 16GB. There's no contest because the PC side simply can't enter the race.

The M4 Max handles a 70B at roughly 15-25 tok/s and — this is the genuinely surprising part — draws only about 80W doing it, staying quiet and cool. A gaming laptop spins its fans to a roar under far lighter loads.

MacBook Pro M4 Max (64GB) runs roughly $3,199-$4,000. That's real money, so be honest about whether you need a 70B model on the go. If you do, this is the only laptop that delivers it.

Who it's for: people whose work genuinely requires a big local model away from a desk. Everyone else: a smaller model on cheaper hardware is the smarter spend.

The budget Mac: MacBook Air M4 15-inch (24GB)

The cheapest credible local-AI-coding laptop. The same 24GB unified memory unlocks 14-20B models cleanly for an interactive Ollama or assistant — at roughly 25-30 tok/s before thermals kick in.

The caveat, stated plainly because it's the whole catch: it's fanless, so it throttles 15-25% after 20-30 minutes of sustained inference. That's fine for chat-style coding help where you think between prompts; it's not great for long batch jobs or long-context generation. And skip the 16GB base config entirely — once macOS takes its cut, 16GB is not enough for serious local models.

MacBook Air 15-inch M4 (24GB) runs about $1,299-$1,499 — the lowest entry price here that still clears the memory floor.

Who it's for: budget-conscious vibe coders who want local model help in bursts and can live with throttling on long runs.

The fast Windows pick: RTX 4090 laptop (16GB VRAM)

If you're on Windows or Linux and want the most tokens-per-second per model size, a discrete RTX 4090 laptop is the fastest thing here for models that fit in its memory — on the order of 100+ tok/s on a 7B. Per-token, a discrete 4090 beats Apple Silicon at the same model size.

Now the honest caveats, because this pick is full of them. The 4090 laptop GPU is 16GB, not the desktop's 24GB — it uses the smaller AD103 die with 9,728 CUDA cores. So it caps at roughly the same model size as a 24GB Mac (14-20B) while costing more. It runs hot (58-64°C chassis), loud, and gets about 38 minutes of battery under light coding. Sustained local inference on a gaming laptop is, realistically, a plugged-in activity.

RTX 4090 gaming laptop (16GB) runs roughly $2,499-$3,299.

Who it's for: people who also game or train models, are happy running plugged in, and want raw speed on small-to-mid models. Buying one purely for local AI coding is hard to justify next to a same-capacity Mac.

The value Windows pick: RTX 4070 laptop (12GB VRAM)

The sensible Windows choice. 12GB of VRAM handles 7B-13B coding models well — the bread-and-butter local-assistant size — at high tok/s. And unlike the 4080/4090 laptops, a 4070 sits in a slimmer chassis that sustains performance without heavy throttling: roughly 6-8 hours of light use and a much cooler 42-47°C under load.

You give up the ability to run 14B+ models, but if your local assistant lives in the 7B-13B range (and for most coding help, it can), you get CUDA, a quieter machine, and real battery life for far less money.

RTX 4070 gaming laptop (12GB) runs about $999-$1,399 — the cheapest entry here, period.

Who it's for: Windows users who want CUDA plus a smaller local model, don't need 30B+, and care about heat and battery.

Quick comparison

| Laptop | Memory | Max model | Speed | Catch | |---|---|---|---|---| | MacBook Pro M4 Pro | 24GB unified | 14-20B | ~30-35 tok/s | Best all-round | | MacBook Pro M4 Max | 48-64GB unified | 70B | ~15-25 tok/s | Expensive | | MacBook Air M4 15" | 24GB unified | 14-20B | ~25-30 tok/s | Fanless, throttles | | RTX 4090 laptop | 16GB VRAM | 14-20B | ~100+ tok/s (7B) | Hot, loud, ~38min battery | | RTX 4070 laptop | 12GB VRAM | 7-13B | high on 7-13B | No 14B+ models |

One note worth keeping: on the same Mac, the MLX runtime runs about 10-25% faster than llama.cpp/, so if you go Apple, it's worth trying both. If you're still deciding which runner to use at all, the LM Studio vs Ollama breakdown covers it.

The bottom line

Rank by memory, not GPU benchmarks. For most people the M4 Pro at 24GB is the right answer — enough capacity for real coding models, fans to sustain it, and a discounted price now that the M5 exists. Need a 70B on the go? Only the M4 Max does it. On a budget? The M4 Air or the RTX 4070 clear the floor for the least money, each with a caveat (throttling on the Air, a 13B ceiling on the 4070). The RTX 4090 laptop is only worth it if you game or train too — for pure local AI coding, its 16GB cap makes it an expensive way to land where a 24GB Mac already is.

And the part most lists won't tell you: if you'll mostly sit at a desk, or you just want a good assistant rather than a private one, you'll get more for your money from a desktop GPU or a cloud plan. Buy the laptop for the model you'll genuinely run away from a wall outlet — nothing more.

Best Laptops to Run Local AI Coding Models (2026)

The sweet spot: MacBook Pro M4 Pro (24GB)

The only laptop that runs 70B: MacBook Pro M4 Max (48-64GB)

The budget Mac: MacBook Air M4 15-inch (24GB)

The fast Windows pick: RTX 4090 laptop (16GB VRAM)

The value Windows pick: RTX 4070 laptop (12GB VRAM)

Quick comparison

The bottom line

Frequently asked questions

The StackBrief weekly

Keep reading

Best Hardware to Run Local AI Coding Models (2026)

Best Mini PCs and Pre-Built Boxes for Local AI (2026)

Best Books to Learn to Code in the AI Era (2026)