South Korea - Ekhbary News Agency
ISSCC 2026: Rebellions Unveils Industry's First Quad-Chiplet AI Solution with UCIe Interconnects, Claiming Rebel100 AI Accelerator Matches Nvidia H200 Power with Lower Envelope
At the International Solid-State Circuits Conference (ISSCC) 2026, South Korean AI inference accelerator designer Rebellions has made a significant technological announcement, detailing its groundbreaking Rebel 100 AI accelerator. This innovative processor stands out as the industry's first to implement a quad-chiplet design, seamlessly interconnected using the Unified Chiplet Interconnect Express (UCIe) standard. Rebellions claims this novel architecture not only pushes the boundaries of performance but also achieves remarkable power efficiency, positioning the Rebel 100 as a formidable contender in the high-performance computing and AI acceleration landscape.
The emergence of multi-chiplet designs represents a pivotal shift in the semiconductor industry, particularly for high-performance AI and High-Performance Computing (HPC) accelerators. As the demand for computational power continues to surge, far outpacing the capabilities of traditional monolithic chip scaling, the multi-chiplet approach offers a compelling alternative. By breaking down complex processors into smaller, specialized chiplets that can be manufactured and assembled independently, companies can enhance yield, reduce costs, and accelerate time-to-market. Major players like AMD, Intel, and Nvidia have already embraced this methodology, integrating it into their latest CPU and GPU offerings, underscoring its strategic importance.
Read Also
- Federal Crackdown Targets 18th Street Gang's Gambling and Drug Network in MacArthur Park
- Spanish Police Dismantle Gambling Money Laundering Ring Exploiting Ukrainian Refugees
- Taipei Prosecutors Charge 62 in Massive Cross-Border Money Laundering Case Linked to Cambodia's Prince Group
- Entain Warns UK Gambling Tax Hikes Risk Fueling Illicit Market Amid Significant Q4 Loss
- Bonus Abuse Surges as Top Fraud Threat in North American Online Gaming, Report Finds
Central to Rebellions' Rebel 100 is the adoption of the UCIe interface, an industry standard designed to facilitate high-bandwidth, low-latency communication between chiplets. This interconnect technology is crucial for enabling disparate chiplets to function cohesively as a single, powerful processing unit. While UCIe has faced a gradual adoption curve, Rebellions' successful implementation at ISSCC 2026 highlights its potential and underscores the value of standardized interconnects in realizing the full promise of multi-chiplet architectures.
The Rebel 100 architecture is a testament to sophisticated engineering. It comprises four Neural Processing Unit (NPU) chiplets, each measuring 320 mm². These NPU chiplets are augmented with 36 GB HBM3E memory stacks, totaling 144 GB of high-bandwidth memory per package. The chiplets are interconnected using a mesh topology, manufactured using Samsung's advanced SF4X process technology and packaged with Samsung's I-CubeS advanced packaging solution, which includes an interposer. To ensure robust power integrity and structural support, the System-in-Package (SiP) also incorporates four integrated silicon capacitor (ISC) dies.
The die-to-die communication is powered by a UCIe-Advanced interface operating at 16 Gbps, delivering an aggregated bandwidth of 4 TB/s. This interconnect boasts a low latency of approximately 11ns (FDI-to-FDI), enabling the SiP to present itself to the system as a unified processor rather than a collection of individual dies. For host connectivity, the Rebel 100 utilizes two PCIe 5.x x16 interfaces, supporting advanced features like SR-IOV and peer-to-peer operations, ensuring seamless integration into existing server infrastructure.
Rebellions makes bold claims regarding the Rebel 100's performance metrics. The company states that a single Rebel 100 SiP can deliver 2 FP8 PFLOPS or 1 FP16 PFLOPS of performance without sparsity, operating at a power envelope of 600W. This is presented as a direct comparison to Nvidia's H200, which achieves similar performance levels but at a higher power consumption of 700W. Furthermore, Rebellions reports an inference throughput of 56.8 TPS on the LLaMA v3.3 70B model for specific input/output sequences, although these figures originate from the vendor and await independent verification. The primary focus of the ISSCC presentation was to elucidate the operational mechanics of this pioneering multi-chiplet UCIe-based AI accelerator.
The company envisions the Rebel 100 quad-chiplet package as a foundational element for larger, cross-node, and rack-level systems designed to tackle the most demanding tasks, including trillion-parameter models and million-token contexts. While specific plans for larger SiPs are not detailed, Rebellions anticipates partners building scale-up and scale-out clusters comprising tens to tens of thousands of these accelerators. Each chiplet houses two Neural Core Clusters, each with eight neural cores and 32 MB of shared memory, featuring an aggregate bandwidth of 64 TB/s. The intricate mesh topology, comprising 64 routers, ensures efficient data flow across the chiplet and, by extension, the entire SiP.
The on-chip network-on-chip (NoC) employs an XY routing scheme, a standard technique to manage packet flow and prevent deadlocks. Router arbitration utilizes a weighted round-robin mechanism, ensuring fair yet prioritized servicing of traffic from various sources, with quality-of-service (QoS) weights adjustable at runtime to optimize for compute-heavy or memory-intensive workloads. This 2D NoC mesh logically extends across the UCIe interconnects, creating a unified mesh-connected processor experience at the logical level. The low chiplet-to-chiplet latency significantly simplifies software development, allowing developers to treat the multi-chiplet package as a single entity.
Related News
- Asus ROG Swift PG32UCDM3: A New Standard for Flagship QD-OLED Gaming
- 2,000-Year-Old Skulls Reveal Ancient Vietnamese Permanently Blackened Teeth
- Japan's Ski Jumping Mixed Team Secures World Cup Bronze, Igniting Olympic Medal Hopes for Milan-Cortina
- Climate Change Threatens Iconic Monarch Butterfly Mass Migration
- Space Debris Crisis: Falcon 9 Re-entry Unleashes Massive Lithium Plume, Scientists Warn of Escalating Atmospheric Pollution
While UCIe 1.0 specifications allow for optional protocol mappings like CXL.io, CXL.mem, and CXL.cache over PCIe 6.0, Rebellions has opted to leverage vendor-defined streaming and memory-semantics protocols, tailored for the Rebel 100. The design incorporates an aggressive data-movement engine, featuring a configurable DMA subsystem with eight execution engines capable of accessing local HBM3E, remote HBM3E on other chiplets, or distributed shared memory, offering up to 2.6 TB/s bandwidth per DMA. Task-level QoS controls are implemented to prevent resource starvation and minimize latency.
Synchronization across the four chiplets is managed by dedicated hardware synchronization managers within each NPU. These managers provide centralized or autonomous control, minimizing inter-unit dependencies and coordination overhead to maintain high utilization. To enhance the reliability of the die-to-die interface, Rebellions has implemented advanced diagnostic features, including loopback modes and transaction-level tracking. For commercial applications, a configurable switching mode offers a trade-off between peak performance and improved Mean Time Between Failures (MTBF) and Mean Time To Failure (MTTF), crucial for large-scale AI deployments where uptime is paramount.