If you want to train neural networks quickly, keep packing in the transistors.
The machine-learning world is obsessed with training AI models as quickly as possible and if the hardware community is to keep up, future chips will need to have more memory.
That’s according to Phillip Wong, veep of corporate research at TSMC, one of the world’s largest chip manufacturers, who began his keynote at the Hot Chips conference in Silicon Valley this week by discussing – you guessed it – Moore’s Law.
Instead of mourning its passing, however, Wong declared that it actually isn’t dead yet and, in fact, the number of transistors that can be crammed inside microprocessors has continued to rise over time. “Moore’s Law is well and alive. It’s not dead, it’s not slowing down, and it’s not even sick,” he claimed on stage.
Yes, the processing power per transistor hasn’t increased by much but that’s not important. Wong argued that even if you built a chip with higher transistor performance, if the density of transistors isn’t high then it’s no good for machine learning. Having more transistors allows engineers to support multi-core chips, build cheap accelerators and increase the amount of SRAM in microprocessors.
SRAM describes a type of temporary memory in chips like GPUs, ASICs or FPGAs. Neural networks are trained to perform specific tasks by learning common patterns in a dataset. Input data is passed through the model and after crunching through tons of vector maths operations, it spits an output.
During the training process, this is repeated many times so that the system’s performance is satisfactory. That might mean being able to recognise people’s faces or voices in the given dataset to a good accuracy. Down at the hardware level, chips have to process the dataset in batches.
Thousands or millions of images or audio clips, for example, are carefully funnelled down a data pipeline that allows chips to process them in chunks. The dataset is stored on a CPU or chip RAM and bits are passed to a GPU’s SRAM to process. Both components have to communicate with one another in order to shuttle data back and forth.
In order to speed the whole process up, chips need more memory to hold as much data as possible from the overall training dataset at any given time, Wong explained. It means that more of the processing power can be spent on running the relevant calculations that actually train neural networks, rather than the chip instructions that control how data passes from CPU to GPU.
As AI models get bigger with more layers and parameters, chips will have to include more memory to train them at increasing speeds. Imagine it a bit like taking water from a well to your house. Having a larger bucket means that you’ll be able to carry more water from the well at any given time, so you don’t need to make as many trips back and forth.
It’s why memory bandwidth – the amount of data transferred to and from an AI chip – is more important than the latency, the speed at which that data is transferred. In an ideal situation, the size of memory on a chip will be larger than the training dataset, Wong said. “We need early engagement between [engineers working on] system applications, device technology, and chip design to make this work.“
The RegisterRelated posts: