VRAM

  • Feb. 26, 2026 / Hardware Insights

    Qwen3.5 27B and Qwen3.5 35B: What Hardware Do You Actually Need? (GPU Benchmarks Inside)

    Qwen3.5 27B fits comfortably on a 24 GB GPU up to 131k context in 4-bit, but becomes memory heavy at 262k. Qwen3.5 35B MoE in 4-bit is the more practical long-context model for 24 GB cards, and it is significantly faster in token generation despite having more total parameters. VRAM is still the main constraint,...

    rtx 3090 on a test bech runnign qwen 3.5 35b MoE
  • Feb. 4, 2026 / Hardware Insights

    Qwen3 Coder Next 80B A3B: what it takes to run it locally

    Direct answer first: Qwen3 Coder Next 80B A3B is one of the most hardware-friendly 80B-class coding models released so far. Thanks to its MoE design with roughly 3B active parameters, a single high-VRAM GPU can run it at full 256k context, and even dual consumer GPUs can handle the 3-bit version comfortably. VRAM, not raw...

    qwen3 coder next building pc for local use
  • Sep. 10, 2025 / LLM Benchmarks

    Can Three RTX 3090s Really Run GPT-OSS 120B with Max Context? I Put It to the Test

    After testing the gpt-oss-20B model on a single RTX 3090, I had to push things further and see what the new heavyweight could do. In addition to the 20B model, OpenAI also released gpt-oss-120B, a massive 120-billion parameter open-weight Mixture-of-Experts (MoE) model with 5.1 billion active parameters. I first ran some experiments on an RTX...

    three rtx 3090 gpus connected for inference on llm