➀ A 30 billion parameter LLM is demonstrated with a prototype inference device equipped with 16 IBM AIU NorthPole processors, achieving a system throughput of 28,356 tokens/second and a latency below 1 ms/token; ➁ NorthPole offers 72.7 times better energy efficiency and lower latency compared to GPUs at the lowest GPU delay; ➂ NorthPole architecture is inspired by the brain, optimized for AI inference, and demonstrates superior performance in LLM推理.
Related Articles
- The Double-Edged Sword of AI Processors: Batch Sizes, Token Rates, and the Hardware Hurdles in Large Language Model Processing8 months ago
- Dell PowerEdge XE9712: NVIDIA GB200 NVL72-based AI GPU cluster for LLM training, inference12 months ago
- This 'gaming PC' is actually a Bluetooth speaker — replica pumps out the jams with faux dual GPUs, liquid cooling, and RGBabout 8 hours ago
- Bride surprises new husband with an RTX 5090 on wedding day — Chinese number slang reveals surprise gift4 days ago
- Lucky PC builder snipes RTX 5090 for just $1,119 — humbles proud shopper who scored one for $1,399 just two days earlier6 days ago
- Moor threading: China's Best GPU Aspirant7 days ago
- Asus reveals how $500,000 ROG Astral RTX 5090D was made — world's most expensive GPU is hewn from 5KG of pure gold8 days ago
- The cheapest Amazon Prime Day gaming laptop is this $599 Acer Nitro V — squeezing in an Intel Core i5 and RTX 4050, with room to upgrade your RAM8 days ago
- Best Amazon Prime Day Tech Deals Live Blog — The Best Tech & PC Hardware Deals on GPUs, CPUs, SSDs, and more9 days ago
- AMD Unveils Fast Motion Response To Enhance Frame Gen For PC Gamers10 days ago