Since the release of langchian, I can quantify my data by vectorizing it and then talk to it with big models like OPENAI, such as PDFs, WORDs, and so on. The effect is quite OK, but only if the quality of these texts is relatively high. Now I want to read my chat records, these records are some human customer service Q&A conversation. I hope to reduce manual labor in this way. But the result is not good.
langchain already provides text splitting stationery such as RecursiveCharacterTextSplitter, but it doesn’t work well for me.