Calling a GPTQ LLM in a script

I have this function, which either calls a GGUF or GPTQ model.

# Loading LLM model
def load_llm(temperature: float = 0.8, 
             top_p: float = 0.95, 
             top_k: int = 40, 
             max_new_tokens: int = 1000, 
             context_length: int = 6000,
             repetition_penalty: float = 1.1):

    model_dir = Path.cwd() / 'model'
    model_name_gguf="openhermes-2.5-mistral-7b.Q4_K_M.gguf"
    model_name_gptq = 'Mistral-7B-Instruct-v0.2-DARE-GPTQ'

    # Check if the gguf model exists
    if os.path.isfile(model_dir / model_name_gguf):
        model_name = model_name_gguf
        # model path
        model_path = model_dir / model_name
    # If not, check if the gptq model exists
    elif os.path.isfile(model_dir / model_name_gptq):
        model_name = model_name_gptq
        # model path
        model_path = model_dir / model_name
        print(model_path)
    else:
        raise Exception("No valid model found in the model directory")
    
    model = AutoModelForCausalLM.from_pretrained(
        str(model_path),
        model_type="mistral",
        gpu_layers=50,
        temperature=temperature, # default is 0.8
        top_p = top_p,
        top_k = top_k,  # default is 40
        max_new_tokens = max_new_tokens,
        context_length = context_length,
        repetition_penalty=repetition_penalty)
    
    print(f"Using loaded model: {model_name}")
    
    return model

The model files for the GPTQ are in a folder called Mistral-7B-Instruct-v0.2-DARE-GPTQ, whereas the GGUF model are just a GGUF file openhermes-2.5-mistral-7b.Q4_K_M.gguf. I get an error when I try to load the GPTQ, and I suspect that it is because I’m referring to a folder and not a model file. The folder for GPTQ contains:

config.json
model.safetensors
quantize_config.json
special_tokens_map.json
tokenizer_config.json
tokenizer.json
tokenizer.model

Do any of you know how I would refer to the GPTQ model correctly? Thanks in advance!!

Leave a Comment