Software

2 minute read

Deploying llama.cpp on AWS (with Troubleshooting)

May 28, 2024

deploying-llama.cpp-on-aws-(with-troubleshooting)

This tutorial was tested on g4dn.xlarge instance with Ubuntu 22.04 operating
system. This tutorial was written explicitly to perform the installation on a Ubuntu 22.04 machine.

Installation Steps

Start an EC2 instance of any class with a GPU with CUDA support.

If you want to compile llama.cpp on this instance, you will need at least 4GB for CUDA drivers and enough space for your LLM of choice. I recommend at least 30GB. Perform the following steps of this tutorial on the instance you started.

Install build dependencies:

sudo apt update
sudo apt install build-essential ccache

Install CUDA Toolkit (only the Base Installer). Download it and follow instructions from
https://developer.nvidia.com/cuda-downloads

At the time of writing this tutorial, the highest available supported version of the Ubuntu version was 22.04. But do not fear! 🙂 We’ll get it to work with some minor workarounds (see the Potential Errors section)
Install NVIDIA Drivers:
```
sudo apt install nvidia-driver-555
```

Compile llama.cpp:

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
LLAMA_CUDA=1 make -j

Benchmark llama.cpp (optional):

Follow the official tutorial if you intend to run the benchmark. However, keep using LLAMA_CUDA=1 make to compile the llama.cpp (do not use LLAMA_CUBLAS=1):
https://github.com/ggerganov/llama.cpp/discussions/4225

Instead of performing a model quantization yourself, you can download quantized models from Hugging Face. For example, Mistral Instruct you can download from https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.2-GGUF/tree/main

Potential Errors

CUDA Architecture Must Be Explicitly Provided

ERROR: For CUDA versions < 11.7 a target CUDA architecture must be explicitly 
provided via environment variable CUDA_DOCKER_ARCH, e.g. by running 
"export CUDA_DOCKER_ARCH=compute_XX" on Unix-like systems, where XX is the 
minimum compute capability that the code needs to run on. A list with compute 
capabilities can be found here: https://developer.nvidia.com/cuda-gpus

You need to check the mentioned page (https://developer.nvidia.com/cuda-gpus)
and pick the appropriate version for your instance's GPU. g4dn instances
use T4 GPU, which would be compute_75.

For example:

CUDA_DOCKER_ARCH=compute_75 LLAMA_CUDA=1 make -j

NVCC not found

/bin/sh: 1: nvcc: not found

You need to add CUDA path to your shell environmental variables.

For example, with Bash and CUDA 12:

export PATH="https://dev.to/usr/local/cuda-12/bin:$PATH"

export LD_LIBRARY_PATH="https://dev.to/usr/local/cuda-12/lib64:$LD_LIBRARY_PATH"

cannot find -lcuda

/usr/bin/ld: cannot find -lcuda: No such file or directory

That means your Nvidia drivers are not installed. Install NVIDIA Drivers first.

Cannot communicate with NVIDIA driver

NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

If you installed the drivers, reboot the instance.

Failed to decode the batch

failed to decode the batch, n_batch = 0, ret = -1
main: llama_decode() failed

There are two potential causes of this issue.

Option 1: Install NVIDIA drivers

Make sure you have installed the CUDA Toolkit and NVIDIA drivers. If you do, restart your server and try again. Most likely, NVIDIA kernel modules are not loaded.

sudo reboot

Option 2: Use different benchmarking parameters

For example, with Mistral Instruct 7B what worked for me is:

./batched-bench ../mistral-7b-instruct-v0.2.Q4_K_M.gguf 2048 2048 512 0 999 128,256,512 128,256 1,2,4,8,16,32

How to Leave the Best Possible Negative Review for a Local Business

May 28, 2024

Software

What’s new in Google Pay at I/O 2024

May 28, 2024

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Hand-Picked Top-Read Stories

Introducing Gemma 3n: The developer guide

Unlock deeper insights with the new Python client library for Data Commons

XIMEA Cameras with Sony Pregius S CMOS Sensors

Trending Tags

Deploying llama.cpp on AWS (with Troubleshooting)

Installation Steps

Potential Errors

CUDA Architecture Must Be Explicitly Provided

NVCC not found

cannot find -lcuda

Cannot communicate with NVIDIA driver

Failed to decode the batch

Option 1: Install NVIDIA drivers

Option 2: Use different benchmarking parameters

Leave a Reply Cancel reply

Previous Post

How to Leave the Best Possible Negative Review for a Local Business

Next Post

What’s new in Google Pay at I/O 2024

Deploying llama.cpp on AWS (with Troubleshooting)

Installation Steps

Potential Errors

CUDA Architecture Must Be Explicitly Provided

NVCC not found

cannot find -lcuda

Cannot communicate with NVIDIA driver

Failed to decode the batch

Option 1: Install NVIDIA drivers

Option 2: Use different benchmarking parameters

Leave a Reply Cancel reply

Previous Post

Next Post

Related Posts