Author: Allan Witt | Page 2

Meta Releases Llama 4: Here’s the Hardware You’ll Need to Run It Yourself

by Allan Witt | Apr 6, 2025 | LLM Hardware News

Meta has just released Llama 4, the latest generation of its open large language model family – and this time, they’re swinging for the fences. With two variants – Llama 4 Scout and Llama 4 Maverick – Meta is introducing a model architecture based on...

Will the New DDR5-9000 and DDR5-8000 Memory Unlock Faster Local LLM Performance?

by Allan Witt | Apr 4, 2025 | LLM Hardware News

G.Skill just dropped a major announcement that should catch the eye of every LLM tinkerer and local inference enthusiast: two new high-end DDR5 kits, one at DDR5-8000 with 128 GB capacity (2x64GB), and another at a blistering DDR5-9000 with 64 GB capacity (2x32GB)....

Dual RTX 5090 Beats $25,000 H100 in Real-World LLM Performance – Here’s How This Affordable Setup Outperforms Enterprise GPUs

by Allan Witt | Apr 1, 2025 | LLM Hardware News

AI enthusiasts looking for top-tier performance in local LLMs have long considered NVIDIA’s H100 to be the gold standard for inference, thanks to its high-bandwidth HBM3 memory and optimized tensor cores. However, recent benchmarks show that a dual RTX 5090...

How Fast Can You Run DeepSeek V3 LLM Model with Dual EPYC Processors and 768GB DDR5 at 24 Channels?

by Allan Witt | Mar 31, 2025 | LLM Hardware News

Local LLM inference is advancing rapidly, and for enthusiasts willing to push the limits, AMD’s EPYC platform is proving to be a compelling option. A recent test of DeepSeek V3 (671B parameters, 37B active MoE) on a dual-EPYC setup with 768GB DDR5-5600MHz memory...

Apple Killer? New AMD LLM Capable PC Costs Half the Price of MacBook Pro!

by Allan Witt | Mar 30, 2025 | LLM Hardware News

The landscape of local AI inference is evolving rapidly, with compact mini-PCs attempting to bridge the gap between affordability and high-performance computing. GMKtec has officially priced its EVO-X2 SFF/Mini-PC at ~$2,000, positioning it as a potential option for...

RTX 5090 Mobile: First LLM Benchmarks Are In

by Allan Witt | Mar 28, 2025 | LLM Hardware News

The first benchmarks for the RTX 5090 Mobile GPU are out, and the results are promising for on-the-go LLM inference. Hardware Canucks ran early tests on a Razer Blade 16 laptop equipped with a 135W RTX 5090 GPU, revealing significant performance gains over the RTX...

First Teardown: 48GB RTX 4090 Mod RUNS 70B LLMs Flawlessly

by Allan Witt | Mar 27, 2025 | LLM Hardware News

While the official GPU market often leaves high-VRAM enthusiasts wanting more without entering the pricey data center territory, the hardware modding scene in China continues to innovate. Reports and reviews, including a recent one from Russian tech channel МК,...

14-Minute Wait?! $10K Mac Studio Crawls with DeepSeek 671B + llama.cpp

by Allan Witt | Mar 27, 2025 | LLM Hardware News

Apple’s latest Mac Studio, particularly the M3 Ultra variant configured with a staggering 512GB of unified memory, presents a unique proposition for local Large Language Model (LLM) enthusiasts. This massive memory pool theoretically allows running models far...

Buying a GPU for LLMs in March 2025? Read This First!

by Allan Witt | Mar 26, 2025 | LLM Hardware News

If you’re looking to get into local LLM inference, choosing the right GPU isn’t just about raw power—it’s about finding the best balance between VRAM, memory bandwidth, and price-to-performance efficiency. Unlike gaming, where factors like clock speeds and...

Nvidia’s G-Assist is Using Llama 3.1 with Llama.cpp – Here’s the Proof!

by Allan Witt | Mar 26, 2025 | LLM Hardware News

As enthusiasts of local LLM inference and hardware performance, the moment we saw Nvidia’s Project G-Assist, one question immediately came to mind: how does it run under the hood? While Nvidia’s official materials emphasize its gaming-focused features, we dug...

How Much VRAM Does Nvidia G-Assist Use While Gaming?

by Allan Witt | Mar 26, 2025 | LLM Hardware News

As enthusiasts of local LLM inference and hardware performance, the moment we saw Nvidia’s Project G-Assist, one question immediately came to mind: how much VRAM does it consume while answering your questions? Today, we’re diving deep into G-Assist’s...

How Fast is Mac Studio M3 Ultra Running the New DeepSeek V3 LLM?

by Allan Witt | Mar 25, 2025 | LLM Hardware News

DeepSeek V3 checkpoint (v3-0324) was just released, and we now have the first benchmarks for Apple’s Mac Studio M3 Ultra surfacing online. While most mainstream publications focus on token generation speeds, real-world workloads often involve large context...

96GB VRAM LLM PC Build Under $10K? RTX Pro 6000 vs. Dual 48GB 4090-Which Wins?

by Allan Witt | Mar 23, 2025 | LLM Hardware News

As AI models grow larger and more demanding, the need for high-VRAM GPUs has never been greater. Running a 70B parameter model like Llama 3.3 with a large context (Llama 3.3 has 130k context) requires lots of VRAM in a 4-bit quantized setup. While NVIDIA’s newly...

NVIDIA Just Made the RTX 5090 Look Weak for LLM – Meet the RTX Pro 6000!

by Allan Witt | Mar 21, 2025 | LLM Hardware News

NVIDIA’s latest professional workstation GPU, the RTX Pro 6000, has arrived with a spec sheet that firmly cements it as a Titan-class card. With its high core count, extensive memory capacity, and a power budget that pushes the limits of PCIe 5.0, the RTX Pro 6000...

RTX PRO 6000 GPU Can Run 70B LLM Models – But There’s a Catch!

by Allan Witt | Mar 20, 2025 | LLM Hardware News

After releasing the spec for their first system that will be able to run 70B models locally, NVIDIA has officially unveiled the RTX PRO 6000 Blackwell Workstation Edition, a high-performance GPU that brings capabilities to professional AI workloads and large-scale...

$3,000 for THIS? NVIDIA’s DGX Spark Faces Tough Competition

by Allan Witt | Mar 19, 2025 | LLM Hardware News

After months of speculation and anticipation, NVIDIA has finally unveiled the full specifications for its DGX Spark workstation (formerly known as Project DIGITS), aimed at AI developers and enthusiasts who want to run large language models locally. With a starting...

The First Mini-PC to Run 70B LLMs Locally: GMK EVO-X2 Unveiled

by Allan Witt | Mar 18, 2025 | LLM Hardware News

In the world of AI, the demand for local inference of large language models (LLMs) is growing. Home users and AI enthusiasts are looking for compact systems capable of running powerful models, such as quantized versions of Llama 3.1 70B, without the need for expensive...

How Fast Will a Ryzen AI MAX+ 395 (Strix Halo) System Be for LLM Inference?

by Allan Witt | Mar 17, 2025 | LLM Hardware News

AMD’s Ryzen AI MAX+ 395 (Strix Halo) brings a unique approach to local AI inference, offering a massive memory allocation advantage over traditional desktop GPUs like the RTX 3090, 4090, or even the upcoming 5090. While initial benchmarks suggest that running a 70B...

$10,000 Mac Studio Destroys NVIDIA in AI Battle?! Run Massive LLMs at HOME!

by Allan Witt | Mar 15, 2025 | LLM Hardware News

The world of local AI has just been flipped on its head, and you won’t BELIEVE which tech giant is leading the charge! Forget cramming multiple power-hungry NVIDIA GPUs into your rig just to touch the edge of massive language models. Apple’s brand new Mac...

NVIDIA Limits RTX 50 Sales: Impact on Local LLM & AI Enthusiasts

by Allan Witt | Feb 20, 2025 | LLM Hardware News

NVIDIA appears to be taking steps to manage the supply of its next-generation GPUs, potentially in response to continued shortages and pricing concerns. The company has revived a limited-access purchase system, reminiscent of past product launches, but with a few...

Recent news