On big questions like this, almost nothing stays constant. When we consider a new technology:
We cannot assume that rates of adoption or usage will remain constant—they may drop, they may even grow.
We cannot assume that the technology supplying our energy needs will remain constant—there could be breakthroughs in efficiency or changes in the overall energy mix.
We cannot assume that the efficiency of the specific technology being adopted will remain constant—we have seen numerous examples of areas where an initial version of something in technology or software faces subsequent improvements that may give it greater capabilities with lower energy usage.
We must also recognise that artificial intelligence (AI) itself could suggest improvements in energy efficiency for specific applications—like the heating and cooling of a building. Therefore, any analysis of energy usage and AI must recognise that the one constant will be change.
Environmental impact of select large language models (LLMs)
LLMs have been garnering the lion’s share of attention amidst the current excitement around generative AI. It makes sense to consider the amount of carbon emissions generated by some of these systems. The Stanford AI Index Report, published in 2023, provided some data, noting that factors like the number of parameters in a model, the power usage effectiveness1 of a data centre, and the grid carbon intensity all matter.
Considering power consumption of an LLM
Those building different LLMs have many levers they can pull in order to influence different characteristics, like energy consumption. Google researchers proposed a family of language models named GLaM (Generalist Language Model), which uses a ‘sparsely activated mixture of experts’. While a full discussion of how that type of approach works is beyond the scope of this piece, we note that the largest of the GLaM models has 1.2 trillion parameters. Knowing solely that data point, the assumption would be that this model would consume more energy than any of the models.
In reality, the GLaM model with 1.2 trillion parameters consumes only one-third of the energy required to train GPT-3 and requires only half of the computation flops for inference operations. A simple way to think of what is going on is that, while the total model has 1.2 trillion parameters, a given input token into the GLaM model is only activating a maximum of 95 billion parameters, that is, the entire model isn’t active across all the parameters. GPT-3, on the other hand, activated all 175 billion parameters on each input token3. It is notable that, even if measuring the performance of AI models occurs on many dimensions, by many measures the GLaM model is able to outperform GPT-3 as well4.
Conclusion
The bottom line is that model design matters, and if model designers want to denote ways to maintain performance but use less energy, they have many options.
Sources
1 Power usage effectiveness (PUE) is useful in evaluating the energy efficiency of data centres in a standard way. PUE = (total amount of energy used by a computer data centre facility) / (energy delivered to computer equipment). A higher PUE means that the data centre is less efficient.
2 Source: Du et al. “GLaM: Efficient Scaling of Language Models with Mixture-of-Experts.” ARXIV.org. 1 August 2022.
3 Source: Patterson, David; Gonzalez, Joseph; Hölzle, Urs; Le, Quoc Hung; Liang, Chen; Munguia, Lluis-Miquel; et al. (2022): The Carbon Footprint of Machine Learning Training Will Plateau, Then Shrink. TechRxiv.
4 Source: Du et al, 1 August 2022.