Instructions to use LLM360/AmberSafe with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use LLM360/AmberSafe with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="LLM360/AmberSafe")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("LLM360/AmberSafe") model = AutoModelForCausalLM.from_pretrained("LLM360/AmberSafe") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use LLM360/AmberSafe with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "LLM360/AmberSafe" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "LLM360/AmberSafe", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/LLM360/AmberSafe
- SGLang
How to use LLM360/AmberSafe with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "LLM360/AmberSafe" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "LLM360/AmberSafe", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "LLM360/AmberSafe" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "LLM360/AmberSafe", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use LLM360/AmberSafe with Docker Model Runner:
docker model run hf.co/LLM360/AmberSafe
AmberSafe
We present AmberSafe, a safety-finetuned instruction model using LLM360/AmberChat as the base. AmberSafe is part of LLM360's Pebble model series.
Model Description
- Model type: Language model with the same architecture as LLaMA-7B
- Language(s) (NLP): English
- License: Apache 2.0
- Resources for more information:
Loading AmberSafe
import torch
from transformers import LlamaTokenizer, LlamaForCausalLM
tokenizer = LlamaTokenizer.from_pretrained("LLM360/AmberSafe")
model = LlamaForCausalLM.from_pretrained("LLM360/AmberSafe")
#template adapated from fastchat
template= "###Human: {prompt}\n###Assistant:"
prompt = "How do I mount a tv to drywall safely?"
input_str = template.format(prompt=prompt)
input_ids = tokenizer(input_str, return_tensors="pt").input_ids
outputs = model.generate(input_ids, max_length=1000)
print(tokenizer.batch_decode(outputs[:, input_ids.shape[1]:-1])[0].strip())
Alternatively, you may use FastChat:
python3 -m fastchat.serve.cli --model-path LLM360/AmberSafe
AmberSafe Finetuning Details
DataMix
| Subset | Number of rows | License |
|---|---|---|
| PKU-Alignment/PKU-SafeRLHF | 330k | cc-by-nc-4.0 |
| Total | 330k |
Data Preprocessing
We filtered the dataset by selecting all data samples with different boolean values in is_response_0_safe and is_response_1_safe. This would make sure that for each pair in the preference dataset, the chosen text is safe and the rejected one is unsafe.
Method
We followed the instructions in the dpo repo to finetune this model.
- Run supervised fine-tuning (SFT) on the dataset(s) of interest.
- Run preference learning on the model from step 1, using preference data (ideally from the same distribution as the SFT examples).
Evaluation
| Model | MT-Bench |
|---|---|
| LLM360/Amber 359 | 2.48750 |
| LLM360/AmberChat | 5.428125 |
| LLM360/AmberSafe | 4.725000 |
Using Quantized Models with Ollama
Please follow these steps to use a quantized version of AmberSafe on your personal computer or laptop:
First, install Ollama by following the instructions provided here. Next, create a quantized version of AmberSafe model (say ambersafe.Q8_0.gguf for 8 bit quantized version) following instructions here. Alternatively, you can download the 8bit quantized version that we created ambersafe.Q8_0.gguf
Create an Ollama Modelfile locally using the template provided below:
FROM ambersafe.Q8_0.gguf
TEMPLATE """{{ .System }}
USER: {{ .Prompt }}
ASSISTANT:
"""
SYSTEM """A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.
"""
PARAMETER stop "USER:"
PARAMETER stop "ASSISTANT:"
PARAMETER repeat_last_n 0
PARAMETER num_ctx 2048
PARAMETER seed 0
PARAMETER num_predict -1
Ensure that the FROM directive points to the created checkpoint file.
- Now, you can proceed to build the model by running:
ollama create ambersafe -f Modelfile
- To run the model from the command line, execute the following:
ollama run ambersafe
You need to build the model once and can just run it afterwards.
Citation
BibTeX:
@misc{liu2023llm360,
title={LLM360: Towards Fully Transparent Open-Source LLMs},
author={Zhengzhong Liu and Aurick Qiao and Willie Neiswanger and Hongyi Wang and Bowen Tan and Tianhua Tao and Junbo Li and Yuqi Wang and Suqi Sun and Omkar Pangarkar and Richard Fan and Yi Gu and Victor Miller and Yonghao Zhuang and Guowei He and Haonan Li and Fajri Koto and Liping Tang and Nikhil Ranjan and Zhiqiang Shen and Xuguang Ren and Roberto Iriondo and Cun Mu and Zhiting Hu and Mark Schulze and Preslav Nakov and Tim Baldwin and Eric P. Xing},
year={2023},
eprint={2312.06550},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
- Downloads last month
- 239