PrivateGPT and CPU without AVX2


In my quest to explore Generative AI and LLM models, I have tried setting up local/offline LLM models. This is where its presence is felt along with gpt4all.

I wanted to try both and realized that gpt4all requires a GUI to run in most cases and there’s still a long way to go before we get proper headless support out of the box. However, PrivateGPT has its own ingestion logic and supports the GPT4All and LlamaCPP model types. Therefore I started to explore this in more detail. There are many prerequisites if you want to work on this model, the most important of which is being able to save a lot of RAM and a lot of CPU for processing power (GPU is better but I stuck with a non-GPU machine to specifically focus on CPU optimized settings).

This post is more of a reminder to myself when I encounter this error again and hopefully it can help others in the same process.

python

You need Python 3.10 to run this system. Ubuntu 20.04 and similar systems don’t have it by default. You’ll need to use a PPA to get Python 3.10 on those systems. seems to be the most frequently referenced version of python.

Command to set this

sudo add-apt-repository ppa:deadsnakes/ppa
sudo apt update
sudo apt install python3.10 python3.10-dev python3.10-distutils 

Installing pip and other packages

Expert Tip: Use venv to avoid breaking your machine’s Python base.

create a new venv environment in the folder containing privategpt. This is a one time step.

python3.10 -m venv venv 

The next activity requires the following two commands

source venv/bin/activate
deactivate

Find the model

The problem with GPT is that it largely depends on the input model you use. has many interesting models listed in it. However, this model did not work for me or me.

  • ggml-gpt4all-j-v1.3-groovy.bin
Using embedded DuckDB with persistence: data will be stored in: db
Found model file.
gptj_model_load: loading model from 'models/ggml-gpt4all-j-v1.3-groovy.bin' - please wait ...
gptj_model_load: n_vocab = 50400
gptj_model_load: n_ctx   = 2048
gptj_model_load: n_embd  = 4096
gptj_model_load: n_head  = 16
gptj_model_load: n_layer = 28
gptj_model_load: n_rot   = 64
gptj_model_load: f16     = 2
gptj_model_load: ggml ctx size = 5401.45 MB
Illegal instruction (core dumped)
  • ggml-stable-vicuna-13B.q4_2.bin
$ python3 privateGPT.py
Using embedded DuckDB with persistence: data will be stored in: db
Found model file.
gptj_model_load: loading model from 'models/ggml-stable-vicuna-13B.q4_2.bin' - please wait ...
gptj_model_load: invalid model file 'models/ggml-stable-vicuna-13B.q4_2.bin' (bad magic)
GPT-J ERROR: failed to load model from models/ggml-stable-vicuna-13B.q4_2.bin

This put a stop to my experiment and I needed to start looking for alternatives. The first stop, as always, is to look at the issue log on the project repository itself. Things get a little complicated because we are looking at 3 projects llamacpp, gpt4all and privategpt.

  • This /issues/203 issue confirms my suspicion that I’m using an old CPU and that might be the problem in this case. in particular they need AVX2 support.
  • To get the gpt4all model to work the suggestion seems to point to recompiling gpt4 however this gets complicated as I’m not directly using gpt4all and I’m using it via a python binding so this would be a mess in itself.
  • This is where I turned my attention to llama.cpp because I had played around with it before and it worked to some degree.

I want to spend a few minutes talking about llama.cpp because it can help us convert models to a usable format, especially if they were created with a different type of software. was one of the first few things to come out and had a lot of tools around model customization.

I again recommend using venv for this project as well. After compiling llama you will be able to play with the model files as listed here. #prepare-data–execute. I’m not going to go into how to get the model, there are n ways and this article will just assume you have the model and we’ll work from that point onwards.

Now I remembered I had done some work on llama.cpp setup and model 7B conversion a few days ago and it was working in chat mode. but it was a small result and the system was not to my taste because my main goal was to get more input from my own notes. So I thought let’s try that system and for privategpt all I needed to do was change the model path in .env

MODEL_TYPE=LlamaCpp
MODEL_PATH=../llama.cpp/models/7B/ggml-model-q4_0.bin

Run python3 privateGPT.py now starting to open the can of worms again.

Error: module format is no longer supported

error loading model: this format is no longer supported (see [ggerganov/llama.cpp#1305](/pull/1305))

It’s funny but it’s okay, I built the model a few weeks ago in the AI/GPT world which feels like a decade. and the Text section above #description confirms that the quantization format has changed in llama. That was easy, I needed to do a git pull and recompile the project. And repeat the conversion and it will solve the problem. After a few minutes, I was ready to run the updated model.

Next error: unknown magic, version

error loading model: unknown (magic, version) combination: 67676a74, 00000003; is this really a GGML file?

This is interesting. I’m pretty sure it’s a ggml file because I just created it but the error is reminiscent of other languages ​​that will fail when the file’s magic values ​​don’t match. I went back to the issue log and found /issues/409#issuecomment-1559128238 and more similar entries pointing to improving the llama-cpp-python binding. I noticed requirements.txt was crashing 0.1.50 so installing a fixed version of pip and changing requirements can solve it.

Commands for reference:

pip install llama-cpp-python==0.1.53

If neither is the error you are facing, I recommend continuing to monitor /issues/276#issuecomment -1554262627 for any updates. For me, the llama-cpp-python binding worked and finally got my privateGPT instance working. I’m still experimenting with data input.

Embedding files in a vector database

  1. It also works in subdirectories.
  2. There is a problem if the file has longer lines or additional characters that are difficult to read in UTF-8

I made a small change to the code to get the filename which caused an ingestion error due to Unicode issues. I added a try catch block to print the name of the file that caused the error.

def load_single_document(file_path: str) -> Document:
    ext = "." + file_path.rsplit(".", 1)[-1]
    if ext in LOADER_MAPPING:
        loader_class, loader_args = LOADER_MAPPING[ext]
        try:
            loader = loader_class(file_path, **loader_args)
            return loader.load()[0]
        except UnicodeDecodeError:
            print(file_path)

    raise ValueError(f"Unsupported file extension '{ext}'")

So far it has worked for me to identify what files need to be removed as sources. I haven’t found a better way to handle those files at the moment. I simply delete the offending files and rely on files that can be digested directly.

Speed ​​up the response

Once the ingestion process is successful, you can now run it python3 privateGPT.py and receive a prompt that will hopefully answer your question. It lists all the sources used to develop that answer. However, you will soon realize that this is very slow. I immediately got excited htop to check how much server load is added by that process and for my entertainment and as expected the server is using only 1 thread and RAM usage is also in full control.

So it looks like the thread count is something I need to update and soon there is a discussion in the privateGPT repo covering the same aspect /discussions/286#discussioncomment -5945851. A quick edit of my privateGPT.py file to the code below did the trick.

 n_cpus = len(os.sched_getaffinity(0))
    match model_type:
        case "LlamaCpp":
            llm = LlamaCpp(model_path=model_path, n_threads=n_cpus, n_ctx=model_n_ctx, callbacks=callbacks, verbose=False)

Now running the code, I can see my 32 threads being used while trying to find the “meaning of life”

Bonus Tip:

Bonus Tip: if you’re just looking for an insanely fast search engine across all your record types, Vector DB makes life very simple. Load a fake model name so that no model is loaded. Enter a search string in the search box; it will point to all the files and sources used to get the relevant text.

What is next?

It feels like I’m just starting on a wild ride, and there’s still so much more to learn and play. I might write more on this topic if I have more work done in this area.



Digital Agency


we specialize in maximizing your online visibility and driving measurable results through strategic SEO solutions. We’re here to help businesses like yours rank higher, attract quality traffic, and achieve long-term growth in the ever-evolving digital landscape.

About the Author

Leave a Reply

Your email address will not be published. Required fields are marked *

You may also like these