LLM model prompt eval time is fast for similar prompts but slow for different prompts
I am using the langchain llama.cpp library to train and use a large language model (LLM) locally. When I ask the model a question with a long prompt, the response time is normal. However, if I then ask the model a question with a prompt that is similar to the first prompt but with some … Read more