How to use quantized version?

by jawad1347 - opened Jun 22, 2024

Discussion

jawad1347

Jun 22, 2024

Kindly write code to use it in colab loading it with 4bit quants. Thanks

yliu279

Salesforce AI Research org Jun 26, 2024

•

edited Jun 26, 2024

Sure, loading it with 4bit quant can be used by BitsAndBytes

            from transformers import BitsAndBytesConfig
            # Ref: https://huggingface.co/blog/4bit-transformers-bitsandbytes
            quantization_config = BitsAndBytesConfig(
                load_in_4bit=True,
                bnb_4bit_quant_type="nf4",
                bnb_4bit_use_double_quant=True,
                bnb_4bit_compute_dtype=torch.bfloat16,
            )
            model = AutoModel.from_pretrained(
                'Salesforce/SFR-Embedding-2_R',
                device_map='auto',
                trust_remote_code=True,
                quantization_config=quantization_config,
                **model_kwargs,
            )

prudant

Jun 28, 2024

can be quantized to gptq o awq? or those format will not be compatible with this arch ?

yliu279

Salesforce AI Research org Jul 1, 2024

•

edited Jul 1, 2024

Hi @prudant ,

Of course, it can be used by other quantization methods.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment