Help Parsing a Table from a Gmail Message

PBreezy · January 2, 2023, 11:34pm

I am importing a set of emails from Gmail that all have a specific html table structure. Some of the emails only have one row of data, some have multiple. I figured out a way to extract data from messages that have one row of data but I suspect it isn’t the most efficient method. I have taken a different approach for messages with multiple rows but am stuck. First, here is the html I am trying to parse:

<div>Search Siren has found new listings matching your saved searches</div><br><div>View all of your notifications at <a href="https://www.searchsiren.com/EbaySearch/Results">searchsiren.com</a></div><br><div><table border="1" cellpadding="4px" style="background-color:#5D5C61;color:#FFFFFA;border:1px solid white;border-collapse:collapse;"><tr><td></td><td>Title</td><td align="center">Condition</td><td align="center">Price</td><td align="center">BIN Price</td><td align="center">S&H</td><td align="center">Best Offer</td></tr><tr><td><img src="https://i.ebayimg.com/thumbs/images/g/~bQAAOSw9bNjjOIh/s-l225.jpg"></td><td><a style="color:#FFFFFA" href="https://www.ebay.com/itm/195503610727?hash=item2d84ec5b67%3Ag%3A%7EbQAAOSw9bNjjOIh&mkevt=1&mkcid=1&mkrid=711-53200-19255-0&campid=5338348243&customid=3899&toolid=10049">The Civil War: A Film Directed By Ken Burns PBS DVD Gold New Sealed</td><td align="center">Brand New</td><td align="right"></td><td align="right">$17.00</td><td align="right">Calculated</td><td align="center"></td></tr><tr><td colspan="7" align="left">Search: Civil War Ken Burns</td></tr><tr><td colspan="7" align="left"><a style="color:#FFFFFA" href="https://www.searchsiren.com/EbaySearch/Edit?ebaySearchId=73103">Edit this search</a></td></tr><tr><td><img src="https://i.ebayimg.com/thumbs/images/g/argAAOSwig9c~SN0/s-l225.jpg"></td><td><a style="color:#FFFFFA" href="https://www.ebay.com/itm/134351413296?hash=item1f47f7e030%3Ag%3AargAAOSwig9c%7ESN0&mkevt=1&mkcid=1&mkrid=711-53200-19255-0&campid=5338348243&customid=3899&toolid=10049">The Civil War PBS Home Video VHS Box Set - A Film By Ken Burns </td><td align="center">Brand New</td><td align="right">$10.00</td><td align="right"></td><td align="right">Calculated</td><td align="center">Yes</td></tr><tr><td colspan="7" align="left">Search: Civil War Ken Burns</td></tr><tr><td colspan="7" align="left"><a style="color:#FFFFFA" href="https://www.searchsiren.com/EbaySearch/Edit?ebaySearchId=73103">Edit this search</a></td></tr></table></div>\n\n

My goal is to output the table data to a Google spreadsheet, example of table is below. Note - I don’t want to output any table column headers. Ideal output would be

Example of Search Identifier is “Search: Civil War”

Example of Table

Search Siren has found new listings matching your saved searches

View all of your notifications at searchsiren.com

	Title	Condition	Price	BIN Price	S&H	Best Offer
	The Civil War: A Film Directed By Ken Burns PBS DVD Gold New Sealed	Brand New		$17.00	Calculated
Search: Civil War Ken Burns
Edit this search
	The Civil War PBS Home Video VHS Box Set - A Film By Ken Burns	Brand New	$10.00		Calculated	Yes
Search: Civil War Ken Burns
Edit this search

\n\n

Here is the current workflow

More Info

n8n.cloud latest Stable Version

MutedJam · January 3, 2023, 10:37am

Hi @PBreezy, welcome to the community

n8n would always require “column headers” I am afraid as these would represent the keys in the data structure used by n8n. When adding rows to an existing sheet in Google Sheets these keys would also be used by n8n to identify the corresponding columns on the respective sheet. n8n would, however, not add additional column headers to your sheet (if this was a concern for you).

As for extracting the HTML table data itself, I think this would do the job:

This is mostly based on this workflow I shared a while back with some small adjustments to take into account your table structure, using the newer Code node, and using some static data to allow running this workflow directly rather than requiring a specific email to arrive (you would of course need to change the Gmail node back in once you’re happy with the workflow).

This is the result:

Hope this helps!

PBreezy · January 3, 2023, 10:36pm

Thank you so much! This is amazing. I reviewed the workflow and everything makes complete sense. I have one other question. Is it possible to create another column used to store this data?

<td colspan="7" align="left">Search: Civil War Ken Burns</td>

I’m looking for the column header to be called ID. I was playing around with a second IF node to extract the data and then use a MERGE node to combine the two outputs. I’m a little lost after that.

Btw, your support is excellent and I’m in love with the n8n product.

PBreezy · January 3, 2023, 11:01pm

I am also looking for the URL of each row

<a style="color:#FFFFFA" href="https://www.ebay.com/itm/195503610727?hash=item2d84ec5b67%3Ag%3A%7EbQAAOSw9bNjjOIh&mkevt=1&mkcid=1&mkrid=711-53200-19255-0&campid=5338348243&customid=3899&toolid=10049">

Again thank you for your help!

MutedJam · January 4, 2023, 1:57pm

Thank you so much! This is amazing. I reviewed the workflow and everything makes complete sense. I have one other question. Is it possible to create another column used to store this data?
<td colspan="7" align="left">Search: Civil War Ken Burns</td>

This should be possible. It seems this “search” value is identical for each occurrence, so we can just fetch it once on the HTML Extract node.

I am also looking for the URL of each row

<a style="color:#FFFFFA" href="https://www.ebay.com/itm/195503610727?hash=item2d84ec5b67%3Ag%3A%7EbQAAOSw9bNjjOIh&mkevt=1&mkcid=1&mkrid=711-53200-19255-0&campid=5338348243&customid=3899&toolid=10049">

This should work by simply adding a suitable selector to the HTML Extract node.

Here’s an updated example taking the two points above into account:

The updated result produced by this workflow:

Hope this helps!

PBreezy · January 5, 2023, 12:32am

Thank you so much. It is all very helpful and I am learning a lot from your examples. The one thing I noticed about the results for the following code is that, it will pull the first element in the search array of the ‘Extract rows’ node and use that value for all rows in the final table, disregarding that each table row from the email will have a unique value

  result['Search'] = $('Extract rows').first().json.search[0].replace('Search:', '').trim();

Here is sample output code I am working with from the Extract rows node

[
  {
    "rows": [
      "<td></td><td>Title</td><td align=\"center\">Condition</td><td align=\"center\">Price</td><td align=\"center\">BIN Price</td><td align=\"center\">S&amp;H</td><td align=\"center\">Best Offer</td>",
      "<td><img src=\"https://i.ebayimg.com/thumbs/images/g/SbsAAOSwwqhjtPl3/s-l225.jpg\"></td><td><a style=\"color:#FFFFFA\" href=\"https://www.ebay.com/itm/125699785376?hash=item1d444a82a0%3Ag%3ASbsAAOSwwqhjtPl3&amp;mkevt=1&amp;mkcid=1&amp;mkrid=711-53200-19255-0&amp;campid=5338348243&amp;customid=3899&amp;toolid=10049\">“Tár” with Slipcover [Blu-ray + DVD, 2 Disc Set] NO DIGITAL</a></td><td align=\"center\">Brand New</td><td align=\"right\"></td><td align=\"right\">$12.00</td><td align=\"right\">Calculated</td><td align=\"center\"></td>",
      "<td colspan=\"7\" align=\"left\">Search: B0BGVG9SXL</td>",
      "<td colspan=\"7\" align=\"left\"><a style=\"color:#FFFFFA\" href=\"https://www.searchsiren.com/EbaySearch/Edit?ebaySearchId=75324\">Edit this search</a></td>"
    ],
    "search": [
      "Search: B0BGVG9SXL"
    ]
  },
  {
    "rows": [
      "<td></td><td>Title</td><td align=\"center\">Condition</td><td align=\"center\">Price</td><td align=\"center\">BIN Price</td><td align=\"center\">S&amp;H</td><td align=\"center\">Best Offer</td>",
      "<td><img src=\"https://i.ebayimg.com/thumbs/images/g/~bQAAOSw9bNjjOIh/s-l225.jpg\"></td><td><a style=\"color:#FFFFFA\" href=\"https://www.ebay.com/itm/195503610727?hash=item2d84ec5b67%3Ag%3A%7EbQAAOSw9bNjjOIh&amp;mkevt=1&amp;mkcid=1&amp;mkrid=711-53200-19255-0&amp;campid=5338348243&amp;customid=3899&amp;toolid=10049\">The Civil War: A Film Directed By Ken Burns PBS DVD Gold New Sealed</a></td><td align=\"center\">Brand New</td><td align=\"right\"></td><td align=\"right\">$17.00</td><td align=\"right\">Calculated</td><td align=\"center\"></td>",
      "<td colspan=\"7\" align=\"left\">Search: Civil War Ken Burns</td>",
      "<td colspan=\"7\" align=\"left\"><a style=\"color:#FFFFFA\" href=\"https://www.searchsiren.com/EbaySearch/Edit?ebaySearchId=73103\">Edit this search</a></td>",
      "<td><img src=\"https://i.ebayimg.com/thumbs/images/g/argAAOSwig9c~SN0/s-l225.jpg\"></td><td><a style=\"color:#FFFFFA\" href=\"https://www.ebay.com/itm/134351413296?hash=item1f47f7e030%3Ag%3AargAAOSwig9c%7ESN0&amp;mkevt=1&amp;mkcid=1&amp;mkrid=711-53200-19255-0&amp;campid=5338348243&amp;customid=3899&amp;toolid=10049\">The Civil War PBS Home Video VHS Box Set - A Film By Ken Burns </a></td><td align=\"center\">Brand New</td><td align=\"right\">$10.00</td><td align=\"right\"></td><td align=\"right\">Calculated</td><td align=\"center\">Yes</td>",
      "<td colspan=\"7\" align=\"left\">Search: Civil War Ken Burns</td>",
      "<td colspan=\"7\" align=\"left\"><a style=\"color:#FFFFFA\" href=\"https://www.searchsiren.com/EbaySearch/Edit?ebaySearchId=73103\">Edit this search</a></td>"
    ],
    "search": [
      "Search: Civil War Ken Burns",
      "Search: Civil War Ken Burns"
    ]
  }
]

MutedJam · January 5, 2023, 9:11am

Hi @PBreezy, the example did have identical text values in the Search lines, so based on the data you have provided my understanding was that this value doesn’t change between different elements.

If it does, you could no longer throw away the “colspan” rows and keep only the item rows. You would also need to change how to iterate through your rows, making the code more annoying to read

It would still be possible though:

PBreezy · January 6, 2023, 12:12am

Thank you so much for your help! This did the trick. Is there someone I can contact to let them know of you great customer service?

system · January 13, 2023, 12:12am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.