VRAM
-
Feb. 26, 2026 / Hardware Insights
Qwen3.5 27B and Qwen3.5 35B: What Hardware Do You Actually Need? (GPU Benchmarks Inside)
Qwen3.5 27B fits comfortably on a 24 GB GPU up to 131k context in 4-bit, but becomes memory heavy at 262k. Qwen3.5 35B MoE in 4-bit is the more practical long-context model for 24 GB cards, and it is significantly faster in token generation despite having more total parameters. VRAM is still the main constraint,...
-
Feb. 4, 2026 / Hardware Insights
Qwen3 Coder Next 80B A3B: what it takes to run it locally
Direct answer first: Qwen3 Coder Next 80B A3B is one of the most hardware-friendly 80B-class coding models released so far. Thanks to its MoE design with roughly 3B active parameters, a single high-VRAM GPU can run it at full 256k context, and even dual consumer GPUs can handle the 3-bit version comfortably. VRAM, not raw...
-
Sep. 10, 2025 / LLM Benchmarks
Can Three RTX 3090s Really Run GPT-OSS 120B with Max Context? I Put It to the Test
After testing the gpt-oss-20B model on a single RTX 3090, I had to push things further and see what the new heavyweight could do. In addition to the 20B model, OpenAI also released gpt-oss-120B, a massive 120-billion parameter open-weight Mixture-of-Experts (MoE) model with 5.1 billion active parameters. I first ran some experiments on an RTX...