Scrape Blog Content from random website
Hello n8n World,
I wish to feed in links of Blogs/Articles/News around the internet to my n8n workflow and only extract the main article content from the URL. I am not from a coding background, just got fascinated with n8n . (please let me know if this is even possible without using paid APIs cz I donât have any money )
WORKFLOW:
From all my research on the web, I understood that we can call in HTTP Request and try to parse it using HTML Extract. Problem I faced was that every website will need a different âCSS Selectorâ which breaks the automation workflow. Please help
My Workflow:
{
âmetaâ: {
âinstanceIdâ: â70a07ce24cb8ce126d756c55af40fe2bf475685b06f76091b87bbc341095cc31â
},
ânodesâ: [
{
âparametersâ: {
âoperationâ: âextractHtmlContentâ,
âextractionValuesâ: {
âvaluesâ: [
{
âkeyâ: â=Articleâ,
âcssSelectorâ: â.m-article__contentâ
}
]
},
âoptionsâ: {
âcleanUpTextâ: true
}
},
âidâ: â2802bd86-592f-44f6-bf3f-5684fab8534eâ,
ânameâ: âHTMLâ,
âtypeâ: ân8n-nodes-base.htmlâ,
âtypeVersionâ: 1.2,
âpositionâ: [
2100,
220
]
},
{
âparametersâ: {
âurlâ: âhttps://medium.com/@youtubiiworkgmailcom/the-art-of-fragrance-a-guide-to-perfumes-6e60fa73785eâ,
âsendHeadersâ: true,
âspecifyHeadersâ: âjsonâ,
âjsonHeadersâ: â{\n "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36",\n "Accept": "text/html",\n "Accept-Language": "en-US,en;q=0.9"\n}â,
âoptionsâ: {}
},
âidâ: â8482b35d-53f0-4dca-b98b-bd912a0500bfâ,
ânameâ: âHTTP Requestâ,
âtypeâ: ân8n-nodes-base.httpRequestâ,
âtypeVersionâ: 4.2,
âpositionâ: [
1880,
220
]
}
],
âconnectionsâ: {
âHTTP Requestâ: {
âmainâ: [
[
{
ânodeâ: âHTMLâ,
âtypeâ: âmainâ,
âindexâ: 0
}
]
]
}
},
âpinDataâ: {}
}
OUTPUT:
I am praying for an output that can just give me simple body text instead of HTML for all the websites.
Information about my n8n setup
- n8n version: 1.69.2
- **Running n8n via npm
- Operating system: macOS Sonoma 14.1.2