[Help Needed] Extracting data from HTML through code block

ConversationalizeAI · January 4, 2025, 12:17pm

I am doing a google search and using this code
"// Get all input items
const items = $input.all();
const allExtractedData = ;

// Loop through each item
for (const item of items) {
try {
// Get HTML content, handle both string and object cases
let htmlContent = item.json;
if (typeof htmlContent === ‘object’ && htmlContent.data) {
htmlContent = htmlContent.data;
}

    const htmlString = typeof htmlContent === 'string' 
        ? htmlContent 
        : JSON.stringify(htmlContent);

    const regex = /https:\/\/www\.reddit\.com\/r\/([\w\d_]+)\/comments\/([\w\d_]+)/g;
    let matches;

    // Extract subreddit and comment ID
    while ((matches = regex.exec(htmlString)) !== null) {
        allExtractedData.push({
            subreddit: matches[1],
            id: matches[2]
        });
    }
} catch (error) {
    console.error('Error processing item:', item, error);
}

}

// Remove duplicates
const uniqueData = [
…new Map(allExtractedData.map(item =>
[${item.subreddit}-${item.id}, item]
)).values()
];

// Map the result for the desired output format
const result = uniqueData.map(data => ({ json: data }));

// Log the final result
console.log(result);

return result;
"
to extract the information.

Everything runs perfectly fine for few time but then system goes into infinite execution mode and never get out from this code block.

Surprisingly when I create a new account and run the same code, it works perfectly fine for first few time and then gives the same issues.

I wonder if anyone has faced same problem or is it a way for n8n to limit the usage of code block?

n8n · January 4, 2025, 12:17pm

It looks like your topic is missing some important information. Could you provide the following if applicable.

n8n version:
Database (default: SQLite):
n8n EXECUTIONS_PROCESS setting (default: own, main):
Running n8n via (Docker, npm, n8n cloud, desktop app):
Operating system:

ria · January 20, 2025, 4:05pm

Hi @ConversationalizeAI

Could be down to memory issues - looking at your code, you’re accumulating all matches in memory before deduplicating them, if there are many matches, this could decrease performance and eventually shut down, like you described.

Please also bear in mind, that the code node is “heavier” on your resources than other core nodes. Perhaps you could try a summarize node and the loop node to break down the processing into smaller steps?

Topic		Replies	Views
Unable to execute regex Questions data-transformation , javascript , code	9	2673	October 11, 2023
Extract data Questions html	5	255	December 8, 2023
Extracting html from website using a code node instead of an extract node Questions	4	2272	May 3, 2023
Why does N8N break at 0.653 MB HTML? Questions html-extract	6	470	April 30, 2023
Getting single json items after the HTML extract node Questions http-request , code , html-extract	3	546	November 3, 2023

[Help Needed] Extracting data from HTML through code block

Related topics