Html extract

When an array is extracted, no paragraphs are created in the texts.

Welcome to the community.

Sorry do not understand. Can you please elaborate?

{
 "nodes": [
{
  "parameters": {},
  "name": "Start",
  "type": "n8n-nodes-base.start",
  "typeVersion": 1,
  "position": [
    250,
    300
  ]
},
{
  "parameters": {
    "url": "https://www.boe.es/buscar/act.php?id=BOE-A-2017-12902&p=20200506&tn=0",
    "responseFormat": "string",
    "jsonParameters": true,
    "options": {}
  },
  "name": "HTTP Request",
  "type": "n8n-nodes-base.httpRequest",
  "typeVersion": 1,
  "position": [
    470,
    300
  ]
},
{
  "parameters": {
    "extractionValues": {
      "values": [
        {
          "key": "artĂ­culo",
          "cssSelector": ".bloque",
          "returnValue": "html",
          "returnArray": true
        }
      ]
    },
    "options": {}
  },
  "name": "HTML Extract",
  "type": "n8n-nodes-base.htmlExtract",
  "typeVersion": 1,
  "position": [
    660,
    300
  ]
},
{
  "parameters": {
    "dataPropertyName": "artĂ­culo",
    "extractionValues": {
      "values": [
        {
          "key": "artĂ­culo",
          "cssSelector": ".articulo",
          "returnArray": true
        },
        {
          "key": "parrafo",
          "cssSelector": ".parrafo",
          "returnArray": true
        }
      ]
    },
    "options": {}
  },
  "name": "HTML Extract1",
  "type": "n8n-nodes-base.htmlExtract",
  "typeVersion": 1,
  "position": [
    860,
    300
  ]
},
{
  "parameters": {
    "keepOnlySet": true,
    "values": {
      "string": [
        {
          "name": "artĂ­culo",
          "value": "={{$json[\"artĂ­culo\"]}}"
        },
        {
          "name": "parrafo",
          "value": "={{$json[\"parrafo\"]}}"
        }
      ]
    },
    "options": {}
  },
  "name": "Set",
  "type": "n8n-nodes-base.set",
  "typeVersion": 1,
  "position": [
    1040,
    300
  ]
},
{
  "parameters": {
    "title": "={{$json[\"artĂ­culo\"]}}",
    "additionalFields": {
      "content": "={{$json[\"parrafo\"]}}",
      "status": "draft"
    }
  },
  "name": "Wordpress",
  "type": "n8n-nodes-base.wordpress",
  "typeVersion": 1,
  "position": [
    1250,
    300
  ],
  "credentials": {
    "wordpressApi": "Ley contratos sector publico"
  }
}
  ],
 "connections": {
"Start": {
  "main": [
    [
      {
        "node": "HTTP Request",
        "type": "main",
        "index": 0
      }
    ]
  ]
},
"HTTP Request": {
  "main": [
    [
      {
        "node": "HTML Extract",
        "type": "main",
        "index": 0
      }
    ]
  ]
},
"HTML Extract": {
  "main": [
    [
      {
        "node": "HTML Extract1",
        "type": "main",
        "index": 0
      }
    ]
  ]
},
"HTML Extract1": {
  "main": [
    [
      {
        "node": "Set",
        "type": "main",
        "index": 0
      }
    ]
  ]
},
"Set": {
  "main": [
    [
      {
        "node": "Wordpress",
        "type": "main",
        "index": 0
      }
    ]
  ]
}
}
}

I want to extract a web page and import it into another but when I try to insert the extracted paragraphs I am not able to keep them

It looks like you have set “Return Value” to “Text” instead of “HTML” on “HTML Extract1”. That is how that mode works. It simply returns the text of an HTML element.

2 Likes

what I need is that the paragraphs be kept when returning the array that I extract. now it returns a comma, in html or a period in text

In the example workflow you posted above the data in “parrafo” is an array. So each paragraph is its own item. So you can combine any way you want.

If the output is supposed to be regular text, you can simply add new lines by changing the expression

{{$json["parrafo"]}}

to

{{$json["parrafo"].join('\n\n')}}
1 Like

Thank you very much, it worked perfectly. Sorry for asking such a simple question, but that’s the NO CODE movement
:smile:

1 Like

We are sadly not there yet with n8n. wip :wink: