Home / LLM Hardware News

RTX 5070 Ti Super Rumored With 24GB VRAM – Plus New Details on RTX 5080 Super and 5070 Super for Local LLMs

Allan Witt • Jun 30, 2025 at 4:50am PDT

💬 0 Comments

Fresh hardware rumors have emerged surrounding NVIDIA’s Blackwell architecture, expanding upon the potential SUPER refresh we discussed in our previous analysis, “Local LLM 24GB and 18GB GPU Options Emerge.” The latest information not only corroborates the existence of the RTX 5080 Super and RTX 5070 Super but introduces a new and potentially crucial SKU for local LLM enthusiast : the RTX 5070 Ti Super. For the people building systems dedicated to LLM inference, these potential specifications warrant a detailed analysis, as they directly address the three most critical metrics: VRAM capacity, memory bandwidth, and the ultimate arbiter, price.

While these specifications remain unconfirmed until an official announcement from NVIDIA, their implications are significant enough to plan for. The introduction of higher-density 3GB GDDR7 memory modules appears to be the enabling technology, allowing for increased VRAM capacities on established memory bus widths – a development that could reshape the price-to-performance landscape for running quantized language models.

The 24GB Contenders: RTX 5080 Super and RTX 5070 Ti Super

The most compelling development for builders seeking a powerful single-card solution is the emergence of two distinct SKUs rumored to feature 24GB of VRAM. Both the RTX 5080 Super and the newly rumored RTX 5070 Ti Super are speculated to be built on the GB203 silicon, utilizing a 256-bit memory bus to achieve their 24GB capacity. This VRAM target is the sweet spot for comfortably running 32B parameter models or tackling larger 70B models with minimal, if any, system memory offloading.

The primary differentiator between these two cards, beyond CUDA core counts, will be memory bandwidth. The RTX 5080 Super is rumored to be equipped with faster 32 Gbps GDDR7 memory. Paired with its 256-bit bus, this configuration would yield a theoretical memory bandwidth of 1024 GB/s. This level of performance places it in the upper echelon of consumer hardware, directly competing with the raw throughput of an RTX 4090 and surpassing the ~936 GB/s offered by the venerable used RTX 3090.

Meanwhile, the RTX 5070 Ti Super is speculated to use more conservative 28 Gbps GDDR7 modules. This results in a still-formidable 896 GB/s of memory bandwidth. This positions the 5070 Ti Super as a fascinating value proposition. It would offer the same critical 24GB VRAM capacity as its more powerful sibling but at a potentially lower price point, with the trade-off being an approximate 12% reduction in memory bandwidth and a lower core count. For many inference workloads where VRAM capacity is the hard gatekeeper, this trade-off could be highly advantageous from a performance-per-dollar standpoint.

The Unique 18GB Option: RTX 5070 Super

Perhaps the most unique SKU in this rumored lineup is the RTX 5070 Super. It is expected to be based on the smaller GB205 GPU, featuring a 192-bit memory bus. By populating this bus with six 3GB GDDR7 modules, it achieves an 18GB VRAM capacity. Coupled with rumored 28 Gbps memory, its bandwidth would calculate to 672 GB/s, identical to the standard RTX 5070.

While 18GB may seem like an odd capacity, it holds a strategic position for system builders. As a single card, it provides a meaningful step up from 12GB and 16GB cards, enabling the use of models or quantization levels that are just out of reach for lower-capacity hardware. However, its true potential lies in multi-GPU configurations. A pair of RTX 5070 Super cards would create a combined VRAM pool of 36GB. This setup is nearly perfect for running 70B-class models like Llama 3 70B in a q4_0 quantization, which requires approximately 38GB of VRAM. A 36GB pool gets exceptionally close, requiring only minimal and manageable offloading to system RAM, and could be assembled for a total cost significantly below that of a single flagship card.

Price and Performance-per-Dollar

The ultimate viability of these cards for the cost-conscious enthusiast hinges entirely on their launch price. At present, the benchmark for value in the high-VRAM space is the second-hand market, where an RTX 3090 with 24GB of VRAM and 936 GB/s of bandwidth can be acquired for $750 to $800. Any new product from NVIDIA must be competitive with this established baseline.

Speculating on price, if NVIDIA follows its historical trends, the SUPER variants often launch at or near the price point of the models they effectively replace. An RTX 5080 Super launching near the RTX 5080’s $999 MSRP would be a landmark event, offering next-generation architecture and superior bandwidth over a used 3090 for a marginal price increase.

The real battle for value will likely be fought by the RTX 5070 Ti Super and 5070 Super. These cards feel like a course correction for the VRAM limitations of their non-SUPER counterparts. The market has shown clear resistance to 12GB cards in the $550 price bracket, making an 18GB RTX 5070 Super a far more compelling proposition if priced similarly. The RTX 5070 Ti Super, with its 24GB of VRAM, could become the default choice for new builds if it lands in the
800 − 900 range, offering a new card with a full warranty as a compelling alternative to the used RTX 3090.

Graphics Card	GPU Die	VRAM	Bandwidth	TGP	Price (Speculated)
RTX 5080 Super	GB203	24GB GDDR7	1024 GB/s	415W	~$999
RTX 5070 Ti Super	GB203	24GB GDDR7	896 GB/s	350W	~$799
RTX 5070 Super	GB205	18GB GDDR7	672 GB/s	275W	~$599
RTX 3090	GA102	24GB GDDR6X	936 GB/s	350W	~$800

Upgrade Paths and Final Thoughts

If these rumors materialize, the Blackwell SUPER refresh will introduce clear and logical upgrade paths for LLM practitioners. Builders with 12GB or 16GB cards will have several options to increase VRAM capacity without resorting to flagship-tier pricing. The RTX 5070 Ti Super could become the new single-card workhorse for those who prioritize VRAM above all else, while the RTX 5070 Super could become the foundation for the most cost-effective dual-GPU 36GB inference systems available.

Until NVIDIA provides official confirmation, this analysis remains speculative. However, the consistent direction of these rumors – toward correcting VRAM deficiencies and leveraging higher-density memory – is an encouraging sign. The focus for the local LLM community should remain fixed on the final, confirmed specifications for memory bandwidth and, most importantly, the retail pricing. These factors will determine whether the RTX 50 SUPER series represents a true step forward in democratizing access to high-performance, on-premise LLM inference.