I have a preprocessed text file from a PDF that I would like to insert directly into the LLM context. I know, RAG is generally what is used here, but Gemini offering a million context window makes me want to try this first before going down RAG. I am processing a document regarding laws, and it is about 100k Tokens. I just want to put it into the model at the beginning of any chat and let the system prompt do the heavy lifting as the user asks questions.
What is the error message (if any)?
Please share your workflow
Chat â AI Agent so far with Gemini as the Model.
(Select the nodes on your canvas and use the keyboard shortcuts CMD+C/CTRL+C and CMD+V/CTRL+V to copy and paste the workflow.)
The system prompt is sent with every message to the user (at least, according to the logs). If I have a 100k system message this will balloon. Ideally I just need it once in the message/context history with the system message driving the AI to focus on prior context. Kind of like attaching a PDF/Text at the beginning and then having a conversation flowing from it.
Is it possible to send a âmockâ message before the userâs first message to get it into their message history?
Yeah you could force send a message as âuserâ or âassistantâ type in some cases, and that would remain as an initial message within history if setup right.
I believe there is a better way, but you could make the request in an openai node initially for example, grabs its session ID, and provide that to the agent nodeâs memory instead of its own session ID.
The system prompt is sent with every message to the user
The system prompt gets written to the message history on the first interaction and is retrieved for each subsequent interaction. It isnât added to the memory again on subsequent interactions, but is already there.
Memory is how you give context to the model for subsequent interactions. If the system message wasnât sent each time, the model wouldnât have access to it and wouldnât be able to answer anything about it.
You could however manipulate the memory before calling the agent if you want to. If youâre using the simple memory (what used to be called âWindow buffer memoryâ), you can use the âChat memory managerâ node to do this.
It is effectively sent with all requests since itâs at the start of the memory and the contents of memory are added in for all requests. But itâs not added an extra time to the memory for each interaction; itâs only in there once.