Choosing Hardware for Running Mistral LLM (7B) Locally

image of a mistral and rtx 4090 gpu

Hey everyone,

Today, I’m diving into the nuts and bolts of setting up your PC to run one of the most intriguing Large Language Models (LLMs) out there – Mistral, a 7-billion-parameter. Currently, Mistral stands as the premier 7B model available. Various fine-tuned versions, such as Zephyr, Dolphin, and OpenHermes, perform exceptionally well in role-playing and coding scenarios.

If you’re looking to run Mistral in your local environment, you’ve come to the right place. Let’s get into the hardware specifics you’ll need to make this happen.

GPU: The Heart of Your Setup

First things first, the GPU. Mistral, being a 7B model, requires a minimum of 6GB VRAM for pure GPU inference. This means the model weights will be loaded inside the GPU memory for the fastest possible inference speed.

For running Mistral locally with your GPU use the RTX 3060 with its 12GB VRAM variant. With 12GB VRAM you will be able to run the model with 5-bit quantization and still have space for larger context size.

Alternatives like the GTX 1660, RTX 2060, AMD 5700 XT, or RTX 3050 can also do the trick, as long as they pack at least 6GB VRAM.

Better GPUs are also an option, such as the RTX 4060 Ti with 16GB VRAM. However, for local LLM inference, the best choice is the RTX 3090 with 24GB of VRAM. If you find it second-hand at a reasonable price, it’s a great deal; it can efficiently run a 33B model entirely on the GPU with very good speed.

GPU inference speed of Mistral 7B model with different GPUs:

GPU model Bandwidth VRAM *Inference
GeForce RTX 3060 360 GBps 12GB ~ 59 tokens/s
GeForce RTX 4060 Ti 288 GBps 16GB ~ 44 tokens/s
GeForce RTX 4070 504 GBps 12GB ~ 70 tokens/s
GeForce RTX 3090 935 GBps 24GB ~ 120 tokens/s
GeForce RTX 4090 1008GBps 24GB ~ 140 tokens/s

*The speed will also depend on system load. 

CPU requirement

Moving on to the CPU – it’s crucial but plays a supporting role to the GPU. For running Mistral, CPUs like Intel Core i9-10900K, i7-12700K, or Ryzen 9 5900x are more than capable. But if you’re pushing the limits, consider something like an AMD Ryzen Threadripper 3990X, boasting 64 cores and 128 threads.

RAM requirements

The amount of RAM is important, especially if you don’t have a GPU or you need to split the model between the GPU and CPU.

For pure CPU inference of Mistral’s 7B model you will need a minimum of 16 GB RAM to avoid any performance hiccups. Because the model inference is memory speed bound it is better to choose memory with higher speed – DDR5 preferably.

If you’re dealing with higher quantization or longer context size, bump that up to 32 GB.

System memory (RAM) required to run Mistral 7B with pure CPU inference:

Model quantization RAM requirement Overall memory requirement
Mistral 2-bit 4.5GB 12GB
Mistral 3-bit 5.2GB 12GB
Mistral 4-bit 6.5GB 16GB
Mistral 5-bit 7.6GB 16GB
Mistral 6-bit 8.8GB 16GB
Mistral 8-bit 11.5GB 32GB

Storage requirements

For storage, you’ll want something fast and spacious. SSDs are the way to go, giving you quick access to data and enough space to store Mistral’s parameters and your datasets.

Cooling

Running LLMs can heat things up. Ensure your system has adequate cooling to maintain optimal performance. This means a good quality CPU cooler and effective case fans.

Power Supply

Last but not least, a reliable power supply unit (PSU) is vital. Given the hardware requirements, aim for something in the range of 600W to 650W for RTX 3060 and 750W for RTZ 3090.

Final Thoughts

Setting up your system for Mistral LLM is an exciting venture. With the right hardware, you can unlock the model’s full potential right in your own home. Remember, it’s all about balance and ensuring each component complements the others for optimal performance.

 

Allan Witt

Allan Witt

Allan Witt is Co-founder and editor in chief of Hardware-corner.net. Computers and the web have fascinated me since I was a child. In 2011 started training as an IT specialist in a medium-sized company and started a blog at the same time. I really enjoy blogging about tech. After successfully completing my training, I worked as a system administrator in the same company for two years. As a part-time job I started tinkering with pre-build PCs and building custom gaming rigs at local hardware shop. The desire to build PCs full-time grew stronger, and now this is my full time job.

Related

Desktops
Best GPUs for 600W and 650W PSU

A high-quality 500W PSU is typically sufficient to power GPUs like the Nvidia GeForce RTX 370 Ti or RTX 4070.

Guides
Dell Outlet and Dell Refurbished Guide

For cheap refurbished desktops, laptops, and workstations made by Dell, you have the option…

Guides
Dell OptiPlex 3020 vs 7020 vs 9020

Differences between the Dell OptiPlex 3020, 7020 and 9020 desktops.

Guides
Best Dedicated GPU for Dell OptiPlex

Pick a GPU for your Dell OptiPlex.