Extract from HTML

cleveradmin · December 28, 2021, 6:02pm

I suspect this is as complicated as I think it is, but I figured I’d ask here before engaging a developer. I have HTML that I’m currently getting from an IMAP node and already using HTML Extract to pull out everything between the BODY tags. What would be even better is if I could turn some of the content into variables, but the HTML doesn’t have any class or id tags that we can use in conjunction with HTML Extract and we have no control over the the HTML we are receiving. Here’s the HTML:

<h2>Client/Agent Information</h2>
<table width="600" border="1" cellspacing="0" cellpadding="4">
   <tbody>
      <tr>
         <td width="130"><strong>Client</strong></td>
         <td>Clever Admin</td>
      </tr>
      <tr>
         <td width="130"><strong>Logged By</strong></td>
         <td>Customer John john@custom.tld</td>
      </tr>
      <tr>
         <td width="130"><strong>Affected User</strong></td>
         <td>Customer John john@custom.tld</td>
      </tr>
      <tr>
         <td width="130"><strong>Agent Name</strong></td>
         <td>JOHNSPC01</td>
      </tr>
   </tbody>
</table>
<br> 
<h2>Ticket Header Information</h2>
<table width="600" border="1" cellspacing="0" cellpadding="4">
   <tbody>
      <tr>
         <td width="130"><strong>Ticket Type</strong></td>
         <td>Emergency</td>
      </tr>
      <tr>
         <td width="130"><strong>Subject</strong></td>
         <td>Computer Issue</td>
      </tr>
      <tr>
         <td width="130"><strong>Submitted By</strong></td>
         <td>Customer John</td>
      </tr>
      <tr>
         <td width="130"><strong>Affected User</strong></td>
         <td>Customer John</td>
      </tr>
   </tbody>
</table>
<h2>Issue Description</h2>
I'm having an issue<br><br> 
<h2>Smart Engineer (If Applicable)</h2>
<br><br> 
<h2>Form Answer Data (If Applicable)</h2>
<br><br> 
<h2>Diagnostic Information</h2>
System Information <br>Host Name: JOHNSPC01 <br>OS Name: Microsoft Windows 10 Pro <br>OS Version: Windows 10 Pro.2009.19041.1.amd64fre.vb_release.191206-1406 <br> <br>User Name: John <br>User Domain: JOHNSPC01

And I’d like to be able to separate out the HTML that follows each TD in each row of the table and then the issue description and information that follows it after the table. Using the example above, I’d end up with the following variables:

client: "Clever Admin"
loggedby: "Customer John john@custom.tld"
affecteduser: "Customer John john@custom.tld"
agentname: "JOHNSPC01"
tickettype: "Emergency"
subject: "Computer Issue"
submittedby: "Customer John"
affectedusername: "Customer John"
issuedescription: "I'm having an issue.<br><br>"
smartengineer: "<br><br>"
formdata: "<br><br>"
diagnostic: "System Information <br>Host Name: JOHNSPC01 <br>OS Name: Microsoft Windows 10 Pro <br>OS Version: Windows 10 Pro.2009.19041.1.amd64fre.vb_release.191206-1406 <br> <br>User Name: John <br>User Domain: JOHNSPC01"

Is there an easy, non-Javascript (or non-extensive-Javascript) way to do this in n8n? I should also mention that all the titles like Client, Logged By, Affected User, etc. are constants and do not change in the source HTML.

harshil1712 · December 29, 2021, 7:26am

Hey @cleveradmin,

I have come across a similar situation in the past, and I used the nth-child CSS selector. If the order of the values are fix, then this selector might help

cleveradmin · December 29, 2021, 2:00pm

Yes, I suspect that will work. We are actually no longer using this workflow as we’ve implemented something different, but I will keep that in mind should I come across this requirement in the future. Thank you.

Topic		Replies	Views
Extract html adding body and head Questions html-extract	2	537	January 9, 2024
Simple html extract I cant get to work Questions html-extract	2	1788	August 4, 2022
Extract data Questions html	4	291	December 8, 2023
Web scraping and extract specific data Questions html-extract	5	3538	September 2, 2022
Extract first column of an HTML Table Questions data-transformation , node	12	2812	August 18, 2021

Extract from HTML

Related topics