Home / LLM Hardware News

PELADN Enters the Strix Halo Arena with YO1 Mini-PC for Local LLM

Allan Witt • Jun 21, 2025 at 5:44am PDT

💬 0 Comments

peladn strix halo ryzen ai max plus 395 for local llm

The landscape for high-density, on-premise AI hardware is rapidly evolving, driven almost single-handedly by the arrival of AMD’s Ryzen AI 300 “Strix Halo” series. For the enthusiast dedicated to running large language models locally, these APUs represent a paradigm shift in performance-per-watt and memory capacity within a small form factor. Adding to this burgeoning market, the semi-custom hardware brand PELADN has unveiled its YO1, a Mini-PC built around the flagship Strix Halo APU, aiming to capture the attention of the technically-astute, price-conscious user.

Core Hardware and System Configuration

At the heart of the PELADN YO1 is the top-tier Ryzen AI MAX+ 395 processor. This APU integrates 16 of AMD’s latest Zen 5 CPU cores with a potent Radeon 8060S integrated GPU, which features 40 RDNA 3.5 Compute Units. This is the most powerful configuration of the Strix Halo family, ensuring maximum computational throughput for both the CPU and iGPU.

For local LLM inference, the standout feature is the system’s 128GB of LPDDR5X memory, specified to run at 8000 MT/s. This soldered memory is a key enabler for loading very large models. PELADN also notes that the APU can operate within three distinct TDP envelopes: a power-efficient 55W, a balanced 85W, and a performance-oriented 120W. It remains unclear whether these power modes are managed via a software utility or a physical switch, a practical detail that will be important for users looking to fine-tune their system’s thermal and acoustic profile.

The Unified Memory and Bandwidth Equation for LLM Inference

The primary appeal of Strix Halo for LLM enthusiasts is its unified memory architecture. The 128GB of system RAM can be partitioned to allocate a vast amount of VRAM to the integrated Radeon 8060S. In a Windows environment, users can typically assign up to 96GB for GPU tasks, while Linux distributions often allow for a more generous allocation, pushing towards 110GB. This massive VRAM pool is the critical factor that allows for models as large as 70-billion parameters to be loaded entirely into the GPU’s addressable memory, eliminating the performance-crippling latency of swapping to system RAM or NVMe storage.

However, VRAM capacity is only half of the story; memory bandwidth is the linchpin for token generation speed. The Ryzen AI MAX+ 395 utilizes a 256-bit wide memory interface. Paired with LPDDR5X-8000 memory, this configuration yields a theoretical peak memory bandwidth of 256 GB/s. This level of throughput is substantial for an integrated solution and is fundamental to achieving usable inference speeds on demanding models. For perspective, this bandwidth approaches that of a discrete NVIDIA GeForce RTX 3060, but with nearly ten times the available VRAM.

Real-World Performance Expectations

With up to 110GB of effective VRAM and 256 GB/s of bandwidth, a system like the PELADN YO1 is positioned to handle a wide range of quantized models. For instance, a 4-bit quantized version of the Llama 3.1 8B model, requiring only 5GB of VRAM, would be exceptionally fast, likely achieving speeds around 36 tokens per second. The system’s true value, however, is demonstrated with larger models.

A 4-bit quantized DeepSeek Llama 3 70B, which consumes approximately 37GB of VRAM, can be loaded entirely into memory with room to spare. On comparable Strix Halo hardware, this configuration has been shown to produce around 5 tokens per second—a very respectable speed for a model of this size in a compact system. Perhaps more impressively, Mixture-of-Experts (MoE) models like the Qwen3 30B MoE can achieve significantly higher throughput. In a 4-bit quantization, this model leverages the architecture’s strengths to deliver over 50 tokens per second, making it highly interactive for real-world use. Even running the same MoE model at a higher-precision 8-bit quantization (consuming 31GB of VRAM) still yields a fluid 41 tokens per second.

Market Positioning and The Competitive Field

PELADN is not entering an empty field. The Strix Halo mini-PC market is becoming intensely competitive, which is excellent news for value-seeking enthusiasts. The YO1’s initial bulk pricing on Alibaba starts at $1850, placing it directly in contention with several other anticipated systems. This includes the recently announced Bosman M5 AI, which is aggressively priced at a promotional $1699. Other notable competitors we’ve covered include the Beelink GTR9 Pro AI, expected around the $1800 mark, and the GMKtec EVO-X2, which is positioned closer to $2000. Zotac has also joined the fray with a Magnus-branded offering.

While the YO1 is not the absolute cheapest entry point, its use of the flagship APU and 128GB of memory makes it a serious contender. For a builder considering their next upgrade, a system like this represents the pinnacle of what is currently possible for local LLMs without transitioning to a much larger, more power-hungry, and expensive multi-GPU desktop build. The final street price and retail availability will ultimately determine its place in a market where every dollar of performance counts.