# Quantized model using llamacpp
Prepare
- git clone the llamacpp project.
- make venv to the project
- install llamacpp prebuild cli commands
Tip
For windows can use winget install llama.cpp to install.
Convert model into gguf format
# from llamacpp git
pip install -r requirements.txt
py convert_hf_to_gguf.py <model path>
Quantized gguf model
# from model location
# using the cli from llamacpp
llama-quantize <gguf model path> Q4_K_M
# serving model with open-ai compatible api with embeding api and simple webui
llama-server -m <model-path> --jinja -c 0 --host 127.0.0.1 --port 8033 --embeddings
# -c is contenxt length, 0 for default from model