hfriedman – BCcampus Open GenAI Project

By now, most are familiar with online generative AI tools like ChatGPT, Gemini, and Claude. They offer users the ability to brainstorm ideas, revise papers, generate code, and more with just a few keystrokes. But did you know that you can have those same capabilities by running generative AI locally on your computer, even without internet? In this blog post, we’ll highlight common reasons generative AI locally might be the right choice for you and how to run models on your computer locally in a step-by-step guide on installing and using GPT4ALL.

What does it mean to “run generative AI locally”?

When I say “running generative AI locally”, I am referring to the practice of using generative AI models that I have downloaded directly on personal devices such as smartphones or laptops, rather than relying on distant cloud servers, as is the case for online tools like ChatGPT.

Why run generative AI locally?

Data and privacy: Centralized services like ChatGPT store your data in their servers, which can include everything from your chat history to device information, and they are allowed to use this data to train future models if you don’t opt out. Further, you do not have control over who sees your data, how it is stored, or how they manage it beyond the options they provide you. This poses major privacy concerns, especially in the post-secondary or proprietary context. When running generative AI locally, all your data is stored locally on your computer and this minimizes the risk of your data being used, stolen, or sold without your consent.
Environmental concerns: Even when these services are online, they still need hardware to be stored and run. In the case of generative AI, this hardware is usually stored in data centers. Data centers require resources, such as the raw materials to create the hardware, the water to cool these large systems, and contribute significantly to global energy consumption (which largely requires the burning of fossil fuels). As a result, many are concerned about the environmental impacts of AI tools as more people use them as casually as Google. By running your AI tools locally, you are lowering the environmental impact of using AI as you are not contributing to the use of data centers and your device limits your energy consumption.
Offline access: Are you in a remote area with spotty internet or dealing with power outages? Then no problem! By using local AI tools, you can use generative AI without the need for internet, which ensures uninterrupted access.
Consistency of output: Cloud-based models are frequently updated, which can disrupt workflows and research that relies on reproducibility. Local setups provide stability by allowing you to use the same model version every time and choose when you download the updated model.

In my exploration of this topic, I have used four different applications to run generative AI locally on my computer: Ollama, Jan, GPT4ALL, and LM Studio. But for this blog post, I have chosen to feature GPT4ALL from Nomic AI for the following reasons:

Is an open-source software,
Emphasizes privacy,
Can interact with your local documents,
Quick to install (10-15 minutes),
Easy to use and is virtually “plug-and-play”,
Easy to customize the System Message of the model, which tells the model how to behave and interpret the conversation.

Get Started with GPT4ALL

The following is a step-by-step guide on downloading, installing, and using GPT4ALL. Disclaimer that I am a Mac user, so this guide shows the process using MacOS.

If you’d prefer a video tutorial, Nomic AI has already published one: Running private, on-device AI chat anywhere | GPT4All Official Tutorial [YouTube Video].
Nomic AI also has a QuickStart guide on their documentation website: GPT4All Desktop – QuickStart.

If you’d like to skip over the installation steps, go to the section Use GPT4ALL.

Download and install GPT4ALL

1. Download GPT4ALL

First, go to https://www.nomic.ai/gpt4all to download GPT4ALL. You do this by selecting the correct operating system for your device (macOS, Windows, Windows ARM, or Ubuntu) from the dropdown menu, and clicking the “Download” button.

2. Open installer

Once downloaded, go to your “Downloads” folder and open the DMG file. Then click to open the GPT4ALL installer.

Shows that gpt4all-installer-darwin (1).dmg has been downloaded

3. Navigate GPT4ALL installer

Once opened, the GPT4ALL Installer Setup window will then pop up. You will have to navigate through several standard windows such as choosing an installation folder (Applications is default), selecting components to install, and accepting the license by clicking “Next”.

GPT4ALL Installer Setup welcome page. There's a settings button and a Next button.

GPT4ALL Installer Setup, choosing the installation folder. The directory pathway is /Application/gpt4all

GPT4ALL Installer Setup, selecting the components to install. The component available is gpt4all, which is selected.

GPT4ALL Installer Setup License Agreement. There is a checkbox that reads "I accept the license".

Once you’ve accepted the license, and clicked “Next”, the installation will begin. Once everything has finished downloading, click “Install”.

GPT4ALL Installer Setup, install window. It shows which archive is being downloaded from the component gpt4all. There's an install button, which is greyed out while it is still downloading.

Use GPT4ALL

1. Open GPT4ALL

Once installed, you can navigate to where you have stored the application and open it. I have chosen to keep the default and stored the application in my Application folder.

Once you open the app, it will give you the welcome pop-up, detailing the latest release notes and allowing you to opt-in to share anonymous usage analytics or sharing of chats.

After making your choices, you are taken to the homepage.

The homepage for the GPT4ALL app. There are three buttons titled "Start Chatting", "LocalDocs", and "Find Models". There is a list of Latest News which is the release notes for GPT4ALL v3.10.0. There is a side menu on the left. In the side menu, there is Home, Chats, Models, LocalDocs, and Settings.

2. Download your first model

Before we start chatting, we first need to download a model to chat with. You can do this by clicking the “Find Models” button on the homepage. This will take you to the “Explore Models” page.

The homepage for the GPT4ALL app. The Find Models button is highlighted.

The simplest way is to choose a model from the list of those specifically configured for use in GPT4ALL. However, you can also use models from Remote Providers or from HuggingFace, though both these options are potentially more complicated and may require additional configuration.

GPT4ALL Explore Models page. There are three menus: GPT4ALL, Remote Providers, and Hugging Face. It shows the GPT4ALL page, which lists models specifically configured for use in GPT4ALL. Models visible are Reasoner V1 and Llama 3 8B Instruct. Each model lists the file size, RAM required, number of parameters, Quant, and type. They all have a Download button.

When picking a model from the GPT4ALL repository, you can see the name of the model, some information about it, and some specifications like the file size, RAM requirements, and number of parameters.

Nomic AI advises to start with Llama 3 8B Instruct as it is a smaller model (8 billion parameters) which means it has a smaller file size and lower RAM requirements. For a basic guide on hardware requirements for local AI, you can read the LinkedIn article The Ultimate Guide to Hosting Your Own Local Large Language Model (LLM) with Ollama and LM Studio.

Once you’ve selected a model, click the “Download” button of that model and it will start downloading.

GPT4ALL Explore Models page. We have clicked download on the Llama 3 8B Instruct. It shows the progress of the download as well as a cancel button.

You can view your Installed Models by clicking the “Models” button on the left sidebar.

GPT4ALL Installed Models page, which shows the locally installed chat models. The Llama 3 8B Instruct is the only model listed. There is a remove button on the model. In the top right, there is a + Add Model button. You can navigate to the page by clicking the Models button in the side menu.

3. Start chatting

Now that you’ve installed a model, you can start chatting with it!

First, click on the “Chats” button on the left sidebar and it will open to a new chat. Second, you have to Choose a model to chat with, either by clicking “Choose a model” at the top of the page and selecting a model or by clicking “Load (default)”. Loading a model will take a few seconds.

GPT4ALL Chat page. You can navigate here by clicking Chats in the sidemenu. We are in a New Chat and you can choose a model by clicking the dropdown menu at the top. You also have the option to Load your last used model, which in our case is Llama 3 8B Instruct since it is the only model we have available. There is also a LocalDocs button in the top right corner.

Now that your model is loaded, you can begin chatting with it.

GPT4ALL Chat page. After we load the model (Llama 3 8B Instruct), we can start chatting by typing a message in the bottom. The page shows that I've asked "What are the 5 most common insects in British Columbia?" and GPT4ALL has generated a response.

3a. Using LocalDocs

A great feature of GPT4ALL is its ability to interact with documents that you upload. As mentioned before, one advantage of running AI locally is that there are fewer risks to your privacy and this extends to the documents.

To add documents to LocalDocs, either click the “LocalDocs” button in the left sidebar or click the “LocalDocs” button in the top right corner of your chat and then click “+ Add Docs”.

GPT4ALL Chat page. In the top right, there is a LocalDocs button. When you click it, a menu of the LocalDocs Collections available appears. There is a + Add Docs button at the top of the menu. You can also view the Collections and add new docs by clicking the LocalDocs button in the left side menu.

GPT4ALL Collections page. No Collections are Installed. There is a button to + Add Doc Collection.

You will be taken to the Add Document Collection page. You can name the collection and then upload a folder with the documents you’d like to use by clicking “Browse” and selecting your desired folder.

GPT4ALL Add Document Collection page. Type in a name for your collection and then add the Folder that you want uploaded as your Collection.

Browse Folder window. The folder titled Pressbooks guides has been selected and there is a Open button in the bottom right corner.

After you’ve selected your folder, click “Create Collection” and your files will start embedding.

GPT4ALL Add Document Collection page. I've named the collection Pressbooks resources and the folder is the Pressbooks guides folder.

GPT4ALL LocalDocs page. The Collection titled Pressbooks resources is now listed and is embedding. There's a Remove button for the specific Collection.

Once embedded, you can go back to your Chats, click “LocalDocs” in the top right corner, and then select the Document Collection you’d like to use in this chat. We only have one Document Collection but you can use multiple in one chat.

Then, you can ask questions about the content in the documents, ask for summaries, and much more. By default, the model will cite the sources it retrieved information from.

GPT4ALL Chat page. I have chosen Pressbooks resources as the Collection available to the model and asked a question that pertains to the documents. The question is "What are the WCAG standards?" and the model has generated a response and cited 2 sources from my LocalDocs Collection.

As we continue exploring this generative AI project, ethical considerations are at the forefront of our minds. As Robynne posited in her first post, how do we define ethical generative AI? What factors do we need to be aware of to ensure that this tool aligns with our values? With this intention, I delved into the complicated world of ethics surrounding AI. From my research, I identified 10 common ethical considerations related to the boom in generative AI and one of those considerations is environmental impact.

It is difficult to quantify the environmental impact of generative AI. It extends to the mining of raw materials used to make the hardware in data centres or the water used to cool these data centres. But the most obvious resource used is energy. Like any other AI, generative AI uses a large amount of energy in its training stage. According to a 2023 study by Alex de Vries, GPT-3’s training process alone consumed an estimated 1,287 MWh of electricity, which is nearly equivalent to the annual energy consumption of 120 average American households in 2022. But energy consumption doesn’t end after the training phase. Each time someone prompts one of these LLMs such as ChatGPT, the hardware that processes and performs these operations consumes energy, estimated to be at least five times more than a normal web search. With the popularity of LLMs such as ChatGPT as well as generative AI being added into seemingly every application and technology, the number of users is only growing.

With that in mind, what do we do to mitigate the environmental impact of our AI Study Tool? The general answer is to ensure that our tool is energy efficient, and we are currently exploring three ways to do this.

Using smaller language models: Given that large language models (LLM) such as ChatGPT consume lots of energy both in training and after release, an obvious way to reduce energy consumption is to use smaller language models. A small language model (SLM) is distinguished from a large language model by the number of parameters, of which the SLM has fewer than the LLM. As an SLM is trained with a smaller dataset, it has fewer parameters and any language model with fewer than 30 billion parameters is considered an SLM. This means that an SLM is more energy-efficient, less costly to train (both in time and energy), and has improved latency. An SLM also improves upon the issues of bias and transparency as, with a smaller dataset, you have more knowledge and control of what goes into training your language model. We are unsure if our use case will allow us to use an SLM, but we are researching existing, open-source models in hopes that we will be able to fine-tune an SLM for our purposes.

Cache-augmented generation (CAG): The most common way to try to ensure that information is accurate is to check valid sources before providing an answer and the way language models usually do this is Retrieval-Augmented Generation or RAG. The idea is that, after receiving a prompt, the language model will then search for information about the query to ensure the information is both accurate and up-to-date before generating an output using the information it has fetched. This step is important to provide accurate information to users and limit the “hallucinations” we are all warned about. But this means that, on top of processing a prompt and generating a response, we now have the added cost of searching for and processing sources every time it is prompted. Enter Cache-Augmented Generation or CAG! Instead of searching through a large database (which could be the entire internet) of information, it is pre-loaded with reference data so that the search for reference information is more efficient. CAG is good for information that does not change frequently, such as one of our textbooks, and can also ensure the accuracy and validity of the information cited, so it seems perfect for our use case.

Caching generated output: Judicious AI Use to Improve Existing OER by Royce Kimmons, George Veletsianos, and Torrey Trust suggests using caching to improve the energy efficiency of the language model. As we discussed, each prompt uses some arbitrary amount of energy and that remains the case even if it’s the same prompt over and over. So, by caching or storing some of these generated responses from the learning model and returning them when a query is repeated, it will use less energy as it does not have to process and generate a new response each time. Further, the authors suggest serving those cached responses to students as OER, which reduces the number of prompts altogether and contributes to improving the equity of generative AI.

Though I’m a little overwhelmed by the sheer volume of information out there regarding energy efficiency in AI, much less the complex subject of AI ethics in general, I am excited about exploring these three solutions. As we start working with the developers for this project, I am interested in learning about the implementation of these concepts and the feasibility of implementation in our AI Study Tool. Researching this project and ensuring it aligns with our values feels like a puzzle I’m trying to solve, and I am enjoying delving into the world of computer science once again and flexing those problem-solving muscles.

Author: hfriedman

How To: Use Generative AI Offline (using GPT4ALL)