As we continue exploring this generative AI project, ethical considerations are at the forefront of our minds. As Robynne posited in her first post, how do we define ethical generative AI? What factors do we need to be aware of to ensure that this tool aligns with our values? With this intention, I delved into the complicated world of ethics surrounding AI. From my research, I identified 10 common ethical considerations related to the boom in generative AI and one of those considerations is environmental impact.
It is difficult to quantify the environmental impact of generative AI. It extends to the mining of raw materials used to make the hardware in data centres or the water used to cool these data centres. But the most obvious resource used is energy. Like any other AI, generative AI uses a large amount of energy in its training stage. According to a 2023 study by Alex de Vries, GPT-3’s training process alone consumed an estimated 1,287 MWh of electricity, which is nearly equivalent to the annual energy consumption of 120 average American households in 2022. But energy consumption doesn’t end after the training phase. Each time someone prompts one of these LLMs such as ChatGPT, the hardware that processes and performs these operations consumes energy, estimated to be at least five times more than a normal web search. With the popularity of LLMs such as ChatGPT as well as generative AI being added into seemingly every application and technology, the number of users is only growing.
With that in mind, what do we do to mitigate the environmental impact of our AI Study Tool? The general answer is to ensure that our tool is energy efficient, and we are currently exploring three ways to do this.
- Using smaller language models: Given that large language models (LLM) such as ChatGPT consume lots of energy both in training and after release, an obvious way to reduce energy consumption is to use smaller language models. A small language model (SLM) is distinguished from a large language model by the number of parameters, of which the SLM has fewer than the LLM. As an SLM is trained with a smaller dataset, it has fewer parameters and any language model with fewer than 30 billion parameters is considered an SLM. This means that an SLM is more energy-efficient, less costly to train (both in time and energy), and has improved latency. An SLM also improves upon the issues of bias and transparency as, with a smaller dataset, you have more knowledge and control of what goes into training your language model. We are unsure if our use case will allow us to use an SLM, but we are researching existing, open-source models in hopes that we will be able to fine-tune an SLM for our purposes.
- Cache-augmented generation (CAG): The most common way to try to ensure that information is accurate is to check valid sources before providing an answer and the way language models usually do this is Retrieval-Augmented Generation or RAG. The idea is that, after receiving a prompt, the language model will then search for information about the query to ensure the information is both accurate and up-to-date before generating an output using the information it has fetched. This step is important to provide accurate information to users and limit the “hallucinations” we are all warned about. But this means that, on top of processing a prompt and generating a response, we now have the added cost of searching for and processing sources every time it is prompted. Enter Cache-Augmented Generation or CAG! Instead of searching through a large database (which could be the entire internet) of information, it is pre-loaded with reference data so that the search for reference information is more efficient. CAG is good for information that does not change frequently, such as one of our textbooks, and can also ensure the accuracy and validity of the information cited, so it seems perfect for our use case.
- Caching generated output: Judicious AI Use to Improve Existing OER by Royce Kimmons, George Veletsianos, and Torrey Trust suggests using caching to improve the energy efficiency of the language model. As we discussed, each prompt uses some arbitrary amount of energy and that remains the case even if it’s the same prompt over and over. So, by caching or storing some of these generated responses from the learning model and returning them when a query is repeated, it will use less energy as it does not have to process and generate a new response each time. Further, the authors suggest serving those cached responses to students as OER, which reduces the number of prompts altogether and contributes to improving the equity of generative AI.
Though I’m a little overwhelmed by the sheer volume of information out there regarding energy efficiency in AI, much less the complex subject of AI ethics in general, I am excited about exploring these three solutions. As we start working with the developers for this project, I am interested in learning about the implementation of these concepts and the feasibility of implementation in our AI Study Tool. Researching this project and ensuring it aligns with our values feels like a puzzle I’m trying to solve, and I am enjoying delving into the world of computer science once again and flexing those problem-solving muscles.