Local LLM 24GB and 18GB GPU Options Emerge: RTX 5070 SUPER and 5080 SUPER Hint at VRAM Gains for Inference
The latest whispers from the hardware grapevine suggest NVIDIA might be preparing SUPER variants for its RTX 50 series, specifically an RTX 5080 SUPER and an RTX 5070 SUPER. While mid-generation refreshes are standard practice, these rumored SKUs are particularly noteworthy for local LLM practitioners due to their potential VRAM configurations, reportedly leveraging higher-density 3GB GDDR7 memory modules.
This development isn’t entirely out of the blue. We first saw these 3GB modules deployed in the mobile RTX 5090 Laptop GPU and the professional RTX 6000 Ada Generation successor (likely the RTX Pro 6000 based on Blackwell). These modules allow for greater memory capacity on established memory bus widths. For instance, placing eight 3GB modules on a 256-bit bus yields 24GB, precisely the configuration seen on the RTX 5090 Laptop GPU (which uses the GB203 chip, same as the desktop RTX 5080).
The RTX 5080 SUPER
The rumor mill suggests the RTX 5080 SUPER will adopt this exact configuration: the GB203 GPU paired with 24GB of GDDR7 memory across its 256-bit bus. This is significant. Many enthusiasts recall the initial rumors and even accidental board partner confirmations pointing towards a 24GB configuration for the original RTX 5080. However, it’s unclear if this was ever truly part of NVIDIA’s plans
From a local LLM perspective, the rumored 24GB VRAM capacity of the RTX 5080 SUPER is the headline feature. This VRAM pool significantly expands the horizons for model deployment on consumer-grade hardware, potentially enabling comfortable operation of complex 32B parameter models on a single card or allowing enthusiasts to tackle demanding 70B models (like Llama 3 70B Q4_0 requiring ~38GB) by splitting the workload across a dual-card setup.
Equally compelling is the potential price point; should NVIDIA follow its historical precedent for SUPER refreshes – typically matching or slightly undercutting the price of the SKU it supersedes – an RTX 5080 SUPER landing near the standard 5080’s $999 MSRP would mark a significant milestone. It would represent the first time a new consumer GPU with 24GB of VRAM is available for under $1000, drastically lowering the cost barrier compared to sourcing an RTX 4090 or navigating the used market for RTX 3090s.
Beyond capacity, memory bandwidth remains paramount for achieving acceptable token generation speeds during inference. The base RTX 5080 already sets a high bar with 960 GB/s from its 16GB of 30 Gbps GDDR7, and notably, its use of 32 Gbps-rated modules (Samsung K4VAF325ZC-SC32) allows overclocking headroom up to 36 Gbps (yielding ~1152 GB/s GB/s, a 20% uplift). The specific memory configuration and resulting bandwidth of the 24GB SUPER variant are yet unknown, but its performance in this critical metric will be a key factor in its overall appeal for LLM inference tasks.
The RTX 5070 SUPER
Perhaps even more unique is the rumored RTX 5070 SUPER. This card is speculated to use the GB205 GPU paired with a 192-bit memory bus. By leveraging six 3GB GDDR7 modules, it could achieve a total VRAM capacity of 18GB.
Perhaps the most intriguing aspect of these rumors for the value-conscious LLM builder is the potential RTX 5070 SUPER with its proposed 18GB GDDR7 configuration. This unique VRAM capacity positions it as an excellent candidate for dual-GPU systems, offering a combined 36GB VRAM pool. Such a setup could potentially exceed the VRAM capacity of even flagship cards like the RTX 4090 (24GB) or the rumored RTX 5090 (speculated 32GB), likely at a significantly lower total investment.
This opens up practical pathways for running demanding 70B-class models (where Q4_0 quantization requires approximately 38GB, making 36GB a very close and workable target, potentially with minimal offloading) or experimenting with even larger models. As a single card, 18GB occupies an interesting niche, providing a necessary VRAM uplift over standard 16GB cards for specific models or quantization levels without demanding the price premium associated with 24GB solutions.
However, the memory bandwidth, a crucial factor for inference speed (tokens/second), remains a significant unknown and hinges critically on the underlying silicon. The standard RTX 5070 uses the GB205 GPU with a 192-bit bus and 12GB of 28 Gbps GDDR7, delivering 672 GB/s. It’s currently unclear whether the 18GB RTX 5070 SUPER would retain this GB205 foundation, likely keeping the 192-bit bus and 672 GB/s bandwidth (assuming 28 Gbps memory). Alternatively, speculation exists that it might utilize a cut-down version of the larger GB203 chip (potentially earmarked for an RTX 5070 Ti), which could grant it access to a wider 256-bit memory bus.
If paired with the same 28 Gbps GDDR7 modules, a 256-bit bus would elevate the bandwidth substantially to 896 GB/s, offering a significant boost in LLM performance. The final choice of GPU and corresponding memory interface will be a key determinant of the RTX 5070 SUPER’s true value proposition for local inference tasks.
Gamer Indifference vs. LLM Enthusiasm
From a pure gaming perspective, these SUPER cards might seem less exciting. A potential RTX 5080 SUPER, while having more VRAM, might offer performance similar to an RTX 4080 SUPER or base 5080 but potentially at a higher (or similar, still high) price than the original 4090, without matching its raw power. More VRAM helps in some gaming scenarios, but the core performance uplift might not justify the cost for gamers.
However, for the local LLM community, VRAM is king, and bandwidth is queen. The prospect of 24GB under $1000 (5080S) and a highly efficient 18GB card perfect for dual-GPU 36GB setups (5070S) is genuinely exciting news, even if purely speculative at this stage.
Comparison Table
Here’s how these rumored cards might stack up against their non-SUPER counterparts (SUPER specs are TBD/Rumored):
Graphics Card Name | Memory Capacity | Memory Bus | Memory Speed | Bandwidth | Est. TBP | Est. Price |
RTX 5080 SUPER (Rumor) | 24 GB GDDR7 | 256-bit | TBD | TBD | TBD | ~$999 US? |
NVIDIA GeForce RTX 5080 | 16 GB GDDR7 | 256-bit | 30 Gbps | 960 GB/s | 360W | $999 US |
RTX 5070 SUPER (Rumor) | 18 GB GDDR7 | 192-bit? | TBD | TBD | TBD | ~$599 US? |
NVIDIA GeForce RTX 5070 | 12 GB GDDR7 | 192-bit | 28 Gbps | 672 GB/s | 250W | $549 US |
(Note: Prices for SUPER variants are purely speculative based on historical trends and rumors. TBP/TDP and final memory speeds are unknown.)
The Path Forward for LLM Builders
If these rumors pan out, the RTX 50 SUPER series could significantly reshape the landscape for building cost-effective, high-VRAM local inference machines.
- Upgrade Path: Owners of 12GB or 16GB cards (like the RTX 3060 12GB, 4070 Ti, or even the base 5080) looking for more VRAM capacity without breaking the bank might find compelling options here.
- New Builds: The potential sub-$1000 24GB RTX 5080 SUPER could become the go-to single-card solution for serious LLM work. The RTX 5070 SUPER might become the default choice for budget-conscious dual-GPU builds targeting the 30-40GB VRAM range.
- Value Proposition: Everything hinges on final pricing and confirmed specifications. If NVIDIA maintains the historical SUPER pricing strategy, these cards could offer exceptional VRAM-per-dollar, a critical metric for our community.
Conclusion
While these RTX 50 SUPER rumors should be taken with a grain of salt until officially confirmed, they represent a potentially significant development for local LLM enthusiasts. The focus on increased VRAM capacity – 24GB for the 5080 SUPER and a novel 18GB for the 5070 SUPER – directly addresses the primary bottleneck for running larger and more capable language models locally. Keep a close eye on official announcements and subsequent performance benchmarks, particularly memory bandwidth tests and power consumption figures, as these cards could redefine the mid-range to high-end segment for homebrew AI hardware.