What is the best way to include attachments varying from pdf,excel,jpg,png,word documents in an ai agent input? Is it better/cheaper to include it all inside a AI agent? or should I seperate it and make a workflow just for parsing documents over into text, and then merge it together to append in the AI agent for more context that came from the documents.
It depends on your use case. If you’re dealing with small, simple files (like short PDFs or clean Word docs), you can safely send them directly to the LLM. Modern multimodal models like GPT-4o or Claude can handle such inputs effectively for quick, single-shot reasoning.
However, as a rule of thumb, it’s always better to separate document parsing from the AI agent. Building a preprocessing layer lets you extract text from PDFs, Excel sheets, and images (using OCR), clean and chunk it, then feed only the relevant parts to the model. This makes your system faster, cheaper, and more scalable — especially when handling larger or mixed-format files.