Best laptop for Large Language Models (LLM)

apple silicone macbook pro with m2 max ship running a 7B llm model

If you’re in the market for the best laptop to handle large language models (LLMs), the best model you can get is a MacBook Pro with the M2 Max chip 38 GUP cores and 64GB of unified memory. It’s a game-changer, especially for folks who need that sweet spot of power and mobility.

For a comprehensive guide on the best Mac options for LLM, including desktop solutions, check out our detailed best Mac for LLM guide.

The 64GB version allows you to use about 48GB (75% from the entire pool) as VRAM, which is crucial for running LLMs efficiently. This laptop can smoothly run 34B models in an 8-bit quantization and handle larger 70B models with decent context length. And if you’re feeling adventurous, you can tweak the memory limits to accommodate the massive 120B models!

The 400 GB/s bandwidth of MacBook Pro M2 Max chips is what sets it apart, offering good inference speeds that are close to GPU setups on desktops and leaving non-Apple laptops, which typically max out at 16GB VRAM, in the dust.

apple silicone m2 max chip with unified memory

Apple Silicone M2 Max chip on a MacBook Pro motherboard includes everything in a single package – CPU, GPU, unified memory, and the controllers.

Non-Apple alternatives often struggle with models larger than 13B, and their performance significantly diminishes when handling even larger models, such as 33B and 70B. This decline in efficiency is primarily due to their reliance on slower system memory, such as DDR5-4800 which offers only 72 GB/s, especially when the model is distributed between the GPU’s VRAM and the RAM.

In contrast, the M2 Max’s unified memory system not only offers higher bandwidth but also ensures smoother operation. So, for those who can’t be tied down to a desktop, this MacBook Pro is your go-to. It balances portability with the power needed for LLM inference.

If you’re open to sacrificing a bit of performance – roughly 7-8% in inference speed and about 20% in prompt processing speed – then the MacBook Pro with M1 Max is a viable alternative. It offers the same unified memory bandwidth as the M2, but with a slightly lower GPU core count (32 against 38). This difference, while noticeable, isn’t drastic. For instance, when running a 7B 4-bit quantized model, the difference in inference speed is only about 5 tokens per second. So, for those who are budget-conscious or don’t need the absolute peak of performance, the M1 Max stands as a solid choice, balancing cost with capability.

Windows and Linux based Laptops for LLM

If you’re looking to step outside the Apple ecosystem and are in the market for a Windows or Linux-based laptop, there are several options you might consider: the RTX 3080 with 16GB, RTX 3080 Ti with 16GB, RTX 4080 with 12GB, or a model equipped with the RTX 4090 with 16GB.

It’s essential to understand that the maximum VRAM you’ll typically find in a PC-based laptop is 16GB. This capacity is adequate for models up to 13B. However, for anything beyond that, you’ll need to split the load between VRAM and RAM to manage the higher memory demands. This split, unfortunately, leads to a significant reduction in inference speed.

Among these options, mobile GPUs like the RTX 4080, boasting a bandwidth of 432.0 GB/s, can offer speeds comparable to the M2 Max in terms of tokens per second. However, the RTX 4080 is somewhat limited with its 12GB of VRAM, making it most suitable for running a 13B 6-bit quantized model, but without much leeway for larger contexts. To get closer to the MacBook Pro’s capabilities, you might want to consider laptops with an RTX 3080 Ti or RTX 4090.

In particular, the mobile RTX 4090, with its 576 GB/s bandwidth, offers slightly better inference speeds than the RTX 4080. It also comes with 16GB of VRAM, allowing it to handle up to 13B models with more context space than the RTX 4080 can provide.

This positions RTX 4090-equipped laptops as a closer, albeit not perfect, alternative to the MacBook Pro in terms of LLM performance, keeping in mind the limitations when compared to Apple’s unified memory system.

Here are the best PC-based laptops for LLM inference:

ASUS ROG Zephyrus G14 2023

asus rog zephyrus g14 2023 large language model compatible

Asus has made some intriguing changes this year, especially with the GPU upgrade. Let’s break down the key hardware aspects and how they fit into the realm of LLM (Large Language Model) inference:

GPUNvidia RTX 4090 Mobile: This is a significant upgrade from AMD GPUs. For LLM tasks, the RTX 4090, even in its mobile form, is a powerhouse due to its high memory bandwidth (576 GB/s). It’s crucial to note though that the mobile version won’t match the desktop RTX 4090’s full capabilities. The 4090 is overkill for LLMs like LLaMA-13B, which requires at least 10GB VRAM, but it’s future-proof and well-suited for even larger models where you will need to split the model layers between the VRAM and RAM.

CPURyzen 9 7940HS: A solid choice for LLM tasks. The CPU is essential for data loading, preprocessing, and managing prompts. The Ryzen 9 7940HS, being a high-end CPU, should handle these tasks efficiently

RAM: With 64GB of RAM, this laptop sits comfortably above the minimum for running models like the 30B, which require at least 20GB VRAM. The memory bandwidth is 4800 Mhz (76.8 GB/s) , the most common setup in a 64GB configuration

MSI Raider GE68HX 13VI

ms raider ge68hx 13vi llm compatible

MSI Raider GE68, with its powerful CPU and GPU, ample RAM, and high memory bandwidth, is well-equipped for LLM inference tasks.

CPUIntel Core i9-13950HX: This is a high-end processor, excellent for tasks like data loading, preprocessing, and handling prompts in LLM applications. The increased performance over previous generations should be beneficial for running LLMs efficiently.

GPU Nvidia RTX 4090 Mobile (576.0 GB/s bandwidth): This GPU is a greate, especially for LLM tasks up to 13B model. The high memory bandwidth is crucial for handling large models efficiently. Although it’s a mobile version and might not reach the peak performance of its desktop counterpart, it’s still significantly powerful for running LLM models.

RAM64 GB of DDR5 5200 memory (83.2 GB/s memory bandwidth): The ample RAM and high memory bandwidth are ideal for LLM tasks. This amount of RAM surpasses the minimum requirements for most LLM models, ensuring smooth operation even with larger context.

Allan Witt

Allan Witt

Allan Witt is Co-founder and editor in chief of Computers and the web have fascinated me since I was a child. In 2011 started training as an IT specialist in a medium-sized company and started a blog at the same time. I really enjoy blogging about tech. After successfully completing my training, I worked as a system administrator in the same company for two years. As a part-time job I started tinkering with pre-build PCs and building custom gaming rigs at local hardware shop. The desire to build PCs full-time grew stronger, and now this is my full time job.


Best GPUs for 600W and 650W PSU

A high-quality 500W PSU is typically sufficient to power GPUs like the Nvidia GeForce RTX 370 Ti or RTX 4070.

Dell Outlet and Dell Refurbished Guide

For cheap refurbished desktops, laptops, and workstations made by Dell, you have the option…

Dell OptiPlex 3020 vs 7020 vs 9020

Differences between the Dell OptiPlex 3020, 7020 and 9020 desktops.

Best Dedicated GPU for Dell OptiPlex

Pick a GPU for your Dell OptiPlex.