The design can run a big neural network more efficiently than banks of GPUs wired together. But manufacturing and running the chip is a challenge, requiring new methods for etching silicon features, a design that includes redundancies to account for manufacturing flaws, and a novel water system to keep the giant chip chilled.
To build a cluster of WSE-2 chips capable of running AI models of record size, Cerebras had to solve another engineering challenge: how to get data in and out of the chip efficiently. Regular chips have their own memory on board, but Cerebras developed an off-chip memory box called MemoryX. The company also created software that allows a neural network to be partially stored in that off-chip memory, with only the computations shuttled over to the silicon chip. And it built a hardware and software system called SwarmX that wires everything together.
“They can improve the scalability of training to huge dimensions, beyond what anybody is doing today,” says Mike Demler, a senior analyst with the Linley Group and a senior editor of The Microprocessor Report.
Demler says it isn’t yet clear how much of a market there will be for the cluster, especially since some potential customers are already designing their own, more specialized chips in-house. He adds that the real performance of the chip, in terms of speed, efficiency, and cost, are as yet unclear. Cerebras hasn’t published any benchmark results so far.
“There’s a lot of impressive engineering in the new MemoryX and SwarmX technology,” Demler says. “But just like the processor, this is highly specialized stuff; it only makes sense for training the very largest models.”
Cerebras’ chips have so far been adopted by labs that need supercomputing power. Early customers include Argonne National Labs, Lawrence Livermore National Lab, pharma companies including GlaxoSmithKline and AstraZeneca, and what Feldman describes as “military intelligence” organizations.
This shows that the Cerebras chip can be used for more than just powering neural networks; the computations these labs run involve similarly massive parallel mathematical operations. “And they’re always thirsty for more compute power,” says Demler, who adds that the chip could conceivably become important for the future of supercomputing.
David Kanter, an analyst with Real World Technologies and executive director of MLCommons, an organization that measures the performance of different AI algorithms and hardware, says he sees a future market for much bigger AI models. “I generally tend to believe in data-centric ML [machine learning], so we want larger data sets that enable building larger models with more parameters,” Kanter says.