Illusion of large on-chip memory by networked computing chips for neural network inference
Hardware for deep neural network (DNN) inference often suffers from insufficient on-chip memory, thus requiring accesses to separate memory-only chips. Such off-chip memory accesses incur considerable costs in terms of energy and execution time. Fitting entire DNNs in on-chip memory is challenging due, in particular, to the physical size of the technology. Here, we report a DNN inference system—termed Illusion—that consists of networked computing chips, each of which contains a certain minimal amount of local on-chip memory and mechanisms for quick wakeup and shutdown. An eight-chip Illusion system hardware achieves energy and execution times within 3.5% and 2.5%, respectively, of an ideal single chip with no off-chip memory. Illusion is flexible and configurable, achieving near-ideal energy and execution times for a wide variety of DNN types and sizes. Our approach is tailored for on-chip non-volatile memory with resilience to permanent write failures, but is applicable to several memory technologies. Detailed simulations also show that our hardware results could be scaled to 64-chip Illusion systems.
Defense Advanced Research Projects Agency
Robert M. Radway, Andrew Bartolo, Paul C. Jolly, Zainab F. Khan, Binh Q. Le, Pulkit Tandon, Tony F. Wu, Yunfeng Xin, Elisa Vianello, Pascal Vivet, Etienne Nowak, H. S.Philip Wong, Mohamed M.Sabry Aly, Edith Beigne, Mary Wootters, and Subhasish Mitra. "Illusion of large on-chip memory by networked computing chips for neural network inference" Nature Electronics (2021): 71-80. https://doi.org/10.1038/s41928-020-00515-3