Skip to content

Conversation

zhangnju
Copy link

when running the int8 benchmark of official bitsandbytes https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/benchmarking/int8/int8_benchmark.py , or the below sample codes for int8 quantization:

from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig

base_model_name = "/models/Llama-2-7b-hf"
tokenizer = AutoTokenizer.from_pretrained(base_model_name)
quantization_config = BitsAndBytesConfig(load_in_8bit=True)
tokenizer = AutoTokenizer.from_pretrained(base_model_name)
bnb_model_8bit = AutoModelForCausalLM.from_pretrained(
base_model_name,
device_map="auto",
quantization_config=quantization_config)

prompt = "What is a large language model?"
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
generated_ids = bnb_model_8bit.generate(**inputs)
outputs = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)
print(outputs)

we met the below hipblaslt matmul issue:
image

this patch can fix the above issue by 1) removing rocblas in class context, since only hipblaslt is used in rocm bitsandbytes 2) adding the workspace pointer for hipblaslt matmul. this patch also includes some more log info to help debug.
after applying this patch, both the above sample codes and the int8 benchmark codes can run successfully.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant