Hardware Insights

  • Feb. 26, 2026 / Hardware Insights

    Qwen3.5 27B and Qwen3.5 35B: What Hardware Do You Actually Need? (GPU Benchmarks Inside)

    Qwen3.5 27B fits comfortably on a 24 GB GPU up to 131k context in 4-bit, but becomes memory heavy at 262k. Qwen3.5 35B MoE in 4-bit is the more practical long-context model for 24 GB cards, and it is significantly faster in token generation despite having more total parameters. VRAM is still the main constraint,...

    rtx 3090 on a test bech runnign qwen 3.5 35b MoE
  • Feb. 4, 2026 / Hardware Insights

    Qwen3 Coder Next 80B A3B: what it takes to run it locally

    Direct answer first: Qwen3 Coder Next 80B A3B is one of the most hardware-friendly 80B-class coding models released so far. Thanks to its MoE design with roughly 3B active parameters, a single high-VRAM GPU can run it at full 256k context, and even dual consumer GPUs can handle the 3-bit version comfortably. VRAM, not raw...

    qwen3 coder next building pc for local use
  • Jan. 26, 2026 / Hardware Insights

    Best Computers for Running ClawdBot (OpenClaw) AI Assistant Locally

    If you are running OpenClaw with a cloud model like Claude Opus, you do not need powerful hardware. Any modern low power system with 8 GB of RAM and a 6th+ gen Intel CPU is enough. If you want to run ClawdBot fully local with reliable tool usage and large context windows, hardware requirements scale...

    clawdbot cli interface with different computer builds
  • Jan. 22, 2026 / Hardware Insights

    We Tested GLM-4.7 Flash 30B MoE — Here’s the GPU You Actually Need

    Z.ai released GLM 4.7 Flash only a few days ago, but meaningful local testing had to wait. The initial llama.cpp support was incomplete, and without proper fixes it was not possible to measure real performance. Those fixes have now landed, and with the latest llama.cpp build we were finally able to test the model properly...

    glm 4.7 flash tested on rtx 5090 rtx 3090 with llm
  • Jan. 20, 2026 / Hardware Insights

    How I Test GPUs for Local LLMs Before I Buy One

    Learn how I test GPUs for local LLM inference before buying, using real workflows, llama.cpp, and rented RTX 3090 instances to measure VRAM, context length, and performance.

    gpu with instance renting for testing with llm
  • Jan. 19, 2026 / Hardware Insights

    Ryzen AI Halo Is Not New Hardware – It’s AMD’s Strix Halo AI Developer Platform

    AMD Ryzen AI Halo is being marketed as a new local AI development solution, but it is important to be precise about what it actually is. Ryzen AI Halo does not introduce new silicon, new performance characteristics, or a faster variant of Strix Halo. It is a reference mini PC platform built around the already...

    amd ryzen ai halo mini pc for locall llm
  • Dec. 11, 2025 / Hardware Insights

    We Tested Devstral 2 (24B & 123B) — Here’s the Hardware You Actually Need

    Mistral AI has just released its new coding model, Devstral 2. We’ve been using its predecessor, Devstral Small, locally for code completion and have been very impressed with its performance. Early reports on Devstral 2 put it on par with other top models like Kimi K2 and Deepseek v3.2, so we were eager to get...

    devstral 2 llm hardware options gpus laptops mini pc
  • Dec. 9, 2025 / Hardware Insights

    Best Unified Memory Computers for Local LLMs (2025): Bandwidth, Memory Size, Speed & Price Comparison

    Unified memory has become one of the most important features for anyone running local LLMs in 2025. Instead of splitting memory between CPU RAM and GPU VRAM, unified architectures pool it into one high-bandwidth space that both the CPU and GPU can access. This matters because LLM inference is memory-bound long before it becomes compute-bound....

    computer models with unified memory for local llm
  • Nov. 17, 2025 / Hardware Insights

    Best Black Friday 2025 GPU Deals for Local LLM Users

    We’re tracking GPUs that make sense for LLM workloads and monitoring their prices now through Black Friday 2025, and we’re grouping them by VRAM since memory capacity determines which models and context lengths they can run, with bandwidth playing a major role in real-world throughput.

    llm capable gpus on discount on black friday