Want to parse multiple .htm files to retrieve text + images

Describe the problem/error/question

I want to build a database of knowledge for my error handling agent that im working on. Firstly im trying to build my vector db using the error codes and their details that i have with me in .htm file. Now im running the n8n workflow locally because my client doesnt have provision to use internet for this purpose. I have all the data needed for generating my vector db for knowledge base. But im not able to do so. I have been able to retreive the text from the .htm file but not able to show the images from it. I have uploaded the screenshot of my htm file. Some error codes have multiple images some have only 1..i want to save text + images in db so that when i retrieve and pass to my model i should be able to display the image and text..for retreival and chat i have created a seperate workflow..let me know if that is also needed will upload..

Please share your workflow

Share the output returned by the last node

Information on your n8n setup

The way to do this is to upload the images to a storage bucket or blob in the db separately, then store the path/url or table id in the metadata field when storing the vectors. Then when you search using your vector store you can reference your metadata field for any additional references to resources. Hope this makes sense

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.