Ding – Faster, More Efficient & Affordable AI

Faster, More-efficient, More-affordable Artificial Intelligence (AI)

illustration of a data center

Excerpt from the COE/CLS Convergence magazine (S23) article - "The Rise of AI" (full article)

Given their remarkable abilities, perhaps it's not surprising that the new AI-driven large language models (LLMs), like ChatGPT, are, well, power hungry. Training and running them is energy-intensive and expensive — the initial release of ChatGPT was trained on ten thousand Nvidia GPUs, at $1,500-$2,000 per unit, or $20 million — which is one reason why, for now, LLMs are the exclusive realm of large companies. UCSB Computer Science assistant professor Yufei Ding is working to make LLMs faster, more customizable by individual users, less expensive, and more energy-efficient as a way of reducing their carbon footprint.

Ding conducts research on three main fronts. The first is designing a hardware accelerator tailored for LLM computing, unlike the CPU in a laptop or desktop computer, which, she says, “is tailored for more ‘general’ computing. “It’s a change in the architecture of the hardware itself.”

The second area is software optimization, where she seeks to ensure that, “As hardware gets more and more complicated, an application can utilize it optimally, automatically,” Ding says.

The third area is fine-tuning of the models. “At the algorithm level, instead of doing end-to-end training [complete training of the model] for everything, maybe we can have a general, powerful foundation model that we need to train once and that will just need some lightweight fine-tuning, such as tuning ChatGPT for medical care — to use it for other applications later,” Ding explains.

“Or maybe I want to give personal information to ChatGPT so that it can help revise my paper, but I want to keep it private; I don’t want the model to be trained on my data,” she explains. “That’s a fine-tuning process that could be done only on my own computer. Big companies have many thousands of GPUs running together, but I might have only one single laptop. How can I do that fine-tuning? It puts new challenges on the hardware and software designs.”

The various areas of Ding’s work address different scales of optimization that grow in scope and layer upon each other, from the smallest, a single device, to multiple devices within a node, up to inter-node coherence and communication. For end-to-end training, big companies are most concerned with parallelizing their thousands of servers to optimize efficiency and service. For a small company or an individual trying to fine-tune an LLM, privacy might be the main concern.

“Things like what kind of hardware you have, what you can afford, and what kind of task you want to do determine the optimization you need to have,” she says. “We want to work across scales to ensure good performance in all kinds of scenarios.”

COE/CLS Convergence magazine (S23) - "The Rise of AI" (page 25)

The COE/CLS Convergence magazine (S23)