Choosing Hardware for Running Mistral LLM (7B) Locally

Last updated: Dec 28, 2023 | Author: Allan Witt

Hey everyone,

Today, I’m diving into the nuts and bolts of setting up your PC to run one of the most intriguing Large Language Models (LLMs) out there – Mistral, a 7-billion-parameter. Currently, Mistral stands as the premier 7B model available. Various fine-tuned versions, such as Zephyr, Dolphin, and OpenHermes, perform exceptionally well in role-playing and coding scenarios.

If you’re looking to run Mistral in your local environment, you’ve come to the right place. Let’s get into the hardware specifics you’ll need to make this happen.

GPU for Mistral LLM

First things first, the GPU. Mistral, being a 7B model, requires a minimum of 6GB VRAM for pure GPU inference. This means the model weights will be loaded inside the GPU memory for the fastest possible inference speed.

For running Mistral locally with your GPU use the RTX 3060 with its 12GB VRAM variant. With 12GB VRAM you will be able to run the model with 5-bit quantization and still have space for larger context size.

Alternatives like the GTX 1660, RTX 2060, AMD 5700 XT, or RTX 3050 can also do the trick, as long as they pack at least 6GB VRAM.

Better GPUs are also an option, such as the RTX 4060 Ti with 16GB VRAM. However, for local LLM inference, the best choice is the RTX 3090 with 24GB of VRAM. If you find it second-hand at a reasonable price, it’s a great deal; it can efficiently run a 33B model entirely on the GPU with very good speed.

GPU inference speed of Mistral 7B model with different GPUs:

GPU model	Bandwidth	VRAM	*Inference
GeForce RTX 3060	360 GBps	12GB	~ 59 tokens/s
GeForce RTX 4060 Ti	288 GBps	16GB	~ 44 tokens/s
GeForce RTX 4070	504 GBps	12GB	~ 70 tokens/s
GeForce RTX 3090	935 GBps	24GB	~ 120 tokens/s
GeForce RTX 4090	1008GBps	24GB	~ 140 tokens/s

*The speed will also depend on system load.

CPU requirement

Moving on to the CPU – it’s crucial but plays a supporting role to the GPU. For running Mistral, CPUs like Intel Core i9-10900K, i7-12700K, or Ryzen 9 5900x are more than capable. But if you’re pushing the limits, consider something like an AMD Ryzen Threadripper 3990X, boasting 64 cores and 128 threads.

RAM requirements

The amount of RAM is important, especially if you don’t have a GPU or you need to split the model between the GPU and CPU.

For pure CPU inference of Mistral’s 7B model you will need a minimum of 16 GB RAM to avoid any performance hiccups. Because the model inference is memory speed bound it is better to choose memory with higher speed – DDR5 preferably.

If you’re dealing with higher quantization or longer context size, bump that up to 32 GB.

System memory (RAM) required to run Mistral 7B with pure CPU inference:

Model quantization	RAM requirement	Overall memory requirement
Mistral 2-bit	4.5GB	12GB
Mistral 3-bit	5.2GB	12GB
Mistral 4-bit	6.5GB	16GB
Mistral 5-bit	7.6GB	16GB
Mistral 6-bit	8.8GB	16GB
Mistral 8-bit	11.5GB	32GB

Storage requirements

For storage, you’ll want something fast and spacious. SSDs are the way to go, giving you quick access to data and enough space to store Mistral’s parameters and your datasets.

Cooling

Running LLMs can heat things up. Ensure your system has adequate cooling to maintain optimal performance. This means a good quality CPU cooler and effective case fans.

Power Supply

Last but not least, a reliable power supply unit (PSU) is vital. Given the hardware requirements, aim for something in the range of 600W to 650W for RTX 3060 and 750W for RTZ 3090.

Final Thoughts

Setting up your system for Mistral LLM is an exciting venture. With the right hardware, you can unlock the model’s full potential right in your own home. Remember, it’s all about balance and ensuring each component complements the others for optimal performance.

Allan Witt

Allan Witt is Co-founder and editor in chief of Hardware-corner.net. Computers and the web have fascinated me since I was a child. In 2011 started training as an IT specialist in a medium-sized company and started a blog at the same time. I really enjoy blogging about tech. After successfully completing my training, I worked as a system administrator in the same company for two years. As a part-time job I started tinkering with pre-build PCs and building custom gaming rigs at local hardware shop. The desire to build PCs full-time grew stronger, and now this is my full time job.

Twitter

Choosing Hardware for Running Mistral LLM (7B) Locally

GPU for Mistral LLM

CPU requirement

RAM requirements

Storage requirements

Cooling

Power Supply

Final Thoughts

Allan Witt

Latest articles

Latest news

Related

Desktops

Best GPUs for 600W and 650W PSU

Guides

Dell Outlet and Dell Refurbished Guide

Guides

Dell OptiPlex 3020 vs 7020 vs 9020

Guides

Best Dedicated GPU for Dell OptiPlex