Translation
GGUF
multilingual
hy-mt
quant
2bit
conversational

it can not be loaded by the newest version llama.cpp..... which version do you use when developing ?

#2
by JamesYdAtJ3 - opened

rm -f /wwwFS.out/unix.socket.llama.sock ; /ai02/binLLM/llama-server --host /wwwFS.out/unix.socket.llama.sock --timeout 3609 -m /ai01/llama-models/Hy-MT1.5-1.8B-2bit.gguf --threads 5 --parallel 2
build_info: b8985-27aef3dd9
system_info: n_threads = 5 (n_threads_batch = 5) / 6 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | BMI2 = 1 | LLAMAFILE = 1 | OPENMP = 1 | REPACK = 1 |
Running without SSL
init: using 6 threads for HTTP server
start: setting address family to AF_UNIX
main: loading model
srv load_model: loading model '/ai01/llama-models/Hy-MT1.5-1.8B-2bit.gguf'
common_init_result: fitting params to device memory, for bugs during this step try to reproduce them with -fit off, or provide --verbose logs if the bug only occurs with -fit on
common_params_fit_impl: getting device memory data for initial parameters:
gguf_init_from_file_ptr: tensor 'blk.0.attn_k_norm.weight' has offset 203248672, expected 203129888
gguf_init_from_file_ptr: failed to read tensor data
llama_model_load: error loading model: llama_model_loader: failed to load model from /ai01/llama-models/Hy-MT1.5-1.8B-2bit.gguf
llama_model_load_from_file_impl: failed to load model
common_fit_params: encountered an error while trying to fit params to free device memory: failed to load model
common_fit_params: fitting params to free memory took 0.04 seconds
gguf_init_from_file_ptr: tensor 'blk.0.attn_k_norm.weight' has offset 203248672, expected 203129888
gguf_init_from_file_ptr: failed to read tensor data
llama_model_load: error loading model: llama_model_loader: failed to load model from /ai01/llama-models/Hy-MT1.5-1.8B-2bit.gguf
llama_model_load_from_file_impl: failed to load model
common_init_from_params: failed to load model '/ai01/llama-models/Hy-MT1.5-1.8B-2bit.gguf'
srv load_model: failed to load model, '/ai01/llama-models/Hy-MT1.5-1.8B-2bit.gguf'
srv operator(): operator(): cleaning up before exit...
main: exiting due to model loading error

The 2-bit GGUF file is corrupt — the tensor offset table doesn't match the actual data layout. Re-downloading or re-quantizing it should resolve the issue entirely, I think.

AngelSlim org

We have used our custom kernel for llama.cpp, which will be released soon.

AngelSlim org

We have released STQ1_0 kernel for 1.25-bit model and given a PR to llama.cpp PR #22836 ! If you have any questions or suggestions for STQ_0, welcome to comment under the PR ! 🔥🔥🔥
2-bit kernel is on the way.

Sign up or log in to comment