Apple Silicon M2 Ultra vs. Multi RTX 3090 Setup for Running Large Language Models


The advancement of large language model (LLM) technologies demands increasingly powerful computing solutions. For running large 70B+ models, on a consumer graded hardware, there are  two prominent options that have emerged – the Apple Silicon M2 Ultra and configurations utilizing multiple RTX 3090 GPUs. Each of these setups offers unique advantages and drawbacks, making the choice no so straight forward. Here, we delve into some pros and cons to aid you in making an informed decision.

Pros of Apple Silicon M2 Ultra

  • Unified Memory System: Offers near limitless VRAM (up to 180GB with high bandwidth), crucial for handling large models with higher quantization.
  • Energy Efficiency: Much lower power consumption compared to multiple RTX 3090s, making it more sustainable and cost-effective in the long run.
  • Ease of Setup: Simple to configure and get running, offering a significant time and effort saving, especially valuable in professional settings where setup time equates to costs.
  • Compact Design: Offers a streamlined, less bulky setup compared to a multi-GPU system, which can be advantageous in space-constrained environments.

Cons of Apple Silicon M2 Ultra

  • CUDA Support: Lacks access to CUDA, limiting the support for AI applications that rely on NVIDIA’s ecosystem, which is a significant part of the AI development landscape.
  • Cost: Initial investment is higher for the M2 Ultra, making it less accessible for hobbyists or small-scale developers compared to the more affordable RTX 3090, especially in the used market.
  • Versatility: While powerful, the M2 Ultra’s performance in specific tasks, such as training or fine-tuning large language models, might not match the raw power and flexibility offered by NVIDIA GPUs.
  • Upgradeability: Unlike a multi-GPU setup where additional GPUs can be added for increased performance or memory, upgrading the M2 Ultra requires purchasing a new unit, limiting scalability.

Pros of Multi RTX 3090 Setup

  • High Performance: Offers great speed and performance for LLM tasks, with the ability to handle high context on large models effectively.
  • Scalability: Easier to upgrade by adding more GPUs, offering a path to increased performance or memory capacity without needing to replace the entire system.
  • Flexibility: Supports a wide range of AI applications through CUDA, making it a versatile choice for developers working across different AI projects.
  • Price: The RTX 3090 is  the more budget-friendly option for LLM inference. A setup with 72GB of VRAM using three RTX 3090s can be assembled for around $3600.

Cons of Multi RTX 3090 Setup

  • Power Consumption: Significantly higher energy usage, leading to increased operational costs and a larger environmental footprint.
  • Complexity and Bulk: Requires a more complicated setup process and results in a bulkier system. For example assembling a tripe or quadruple RTX 3090 setup is much more difficult and time consuming for novice PC builders. 

In summary, the choice between Apple Silicon M2 Ultra and a multi-GPU RTX 3090 setup depends on specific needs, including performance requirements, energy efficiency, cost, and the LLM applications being developed.