Hi everyone ![]()
Iโm working on an automation in n8n where I:
- Trigger with a schedule
- Use the Apify actor to fetch scraped data
- Format the result in a Code node
- Then pass the content to OpenAI Embeddings
- Finally, store the vectors in Pinecone
Code inside my Code node:
js
CopyEdit
const data = $input.item.json;
if (!data || !data.document || !data.url) {
return {
json: {
pageContent: '',
metadata: {}
}
};
}
const content = data.document;
const url = data.url;
if (typeof content !== 'string' || content.trim().length < 10) {
return {
json: {
pageContent: '',
metadata: {}
}
};
}
const maxLength = 6000;
const finalContent = content.length > maxLength
? content.substring(0, maxLength) + '...'
: content;
const result = {
pageContent: finalContent,
metadata: {
url: url,
source: 'apify',
title: url.replace(/^https?:\/\//, '').split('/')[0],
originalSize: content.length,
processedSize: finalContent.length,
timestamp: new Date().toISOString()
}
};
return { json: result };
PROBLEM:
The embedding isnโt working โ Pinecone receives no vector or fails to store the data.
I suspect that the format passed to the OpenAI Embedding node or Pinecone might be incorrect.
MY QUESTION:
How should I structure my nodes after the Code node to ensure:
- OpenAI receives only
pageContentfor embedding - Pinecone receives the proper
id,embedding vector, andmetadata
Any working example or clean structure would help me a lot ![]()
