I am importing a set of emails from Gmail that all have a specific html table structure. Some of the emails only have one row of data, some have multiple. I figured out a way to extract data from messages that have one row of data but I suspect it isn’t the most efficient method. I have taken a different approach for messages with multiple rows but am stuck. First, here is the html I am trying to parse:
<div>Search Siren has found new listings matching your saved searches</div><br><div>View all of your notifications at <a href="https://www.searchsiren.com/EbaySearch/Results">searchsiren.com</a></div><br><div><table border="1" cellpadding="4px" style="background-color:#5D5C61;color:#FFFFFA;border:1px solid white;border-collapse:collapse;"><tr><td></td><td>Title</td><td align="center">Condition</td><td align="center">Price</td><td align="center">BIN Price</td><td align="center">S&H</td><td align="center">Best Offer</td></tr><tr><td><img src="https://i.ebayimg.com/thumbs/images/g/~bQAAOSw9bNjjOIh/s-l225.jpg"></td><td><a style="color:#FFFFFA" href="https://www.ebay.com/itm/195503610727?hash=item2d84ec5b67%3Ag%3A%7EbQAAOSw9bNjjOIh&mkevt=1&mkcid=1&mkrid=711-53200-19255-0&campid=5338348243&customid=3899&toolid=10049">The Civil War: A Film Directed By Ken Burns PBS DVD Gold New Sealed</td><td align="center">Brand New</td><td align="right"></td><td align="right">$17.00</td><td align="right">Calculated</td><td align="center"></td></tr><tr><td colspan="7" align="left">Search: Civil War Ken Burns</td></tr><tr><td colspan="7" align="left"><a style="color:#FFFFFA" href="https://www.searchsiren.com/EbaySearch/Edit?ebaySearchId=73103">Edit this search</a></td></tr><tr><td><img src="https://i.ebayimg.com/thumbs/images/g/argAAOSwig9c~SN0/s-l225.jpg"></td><td><a style="color:#FFFFFA" href="https://www.ebay.com/itm/134351413296?hash=item1f47f7e030%3Ag%3AargAAOSwig9c%7ESN0&mkevt=1&mkcid=1&mkrid=711-53200-19255-0&campid=5338348243&customid=3899&toolid=10049">The Civil War PBS Home Video VHS Box Set - A Film By Ken Burns </td><td align="center">Brand New</td><td align="right">$10.00</td><td align="right"></td><td align="right">Calculated</td><td align="center">Yes</td></tr><tr><td colspan="7" align="left">Search: Civil War Ken Burns</td></tr><tr><td colspan="7" align="left"><a style="color:#FFFFFA" href="https://www.searchsiren.com/EbaySearch/Edit?ebaySearchId=73103">Edit this search</a></td></tr></table></div>\n\n
My goal is to output the table data to a Google spreadsheet, example of table is below. Note - I don’t want to output any table column headers. Ideal output would be
Listing URL (non image URL) | Title | Condition | Price | Bin | S&H | Best Offer | Search Identifier
Example of Search Identifier is “Search: Civil War”
Example of Table
Search Siren has found new listings matching your saved searches
n8n would always require “column headers” I am afraid as these would represent the keys in the data structure used by n8n. When adding rows to an existing sheet in Google Sheets these keys would also be used by n8n to identify the corresponding columns on the respective sheet. n8n would, however, not add additional column headers to your sheet (if this was a concern for you).
As for extracting the HTML table data itself, I think this would do the job:
This is mostly based on this workflow I shared a while back with some small adjustments to take into account your table structure, using the newer Code node, and using some static data to allow running this workflow directly rather than requiring a specific email to arrive (you would of course need to change the Gmail node back in once you’re happy with the workflow).
Thank you so much! This is amazing. I reviewed the workflow and everything makes complete sense. I have one other question. Is it possible to create another column used to store this data?
<td colspan="7" align="left">Search: Civil War Ken Burns</td>
I’m looking for the column header to be called ID. I was playing around with a second IF node to extract the data and then use a MERGE node to combine the two outputs. I’m a little lost after that.
Btw, your support is excellent and I’m in love with the n8n product.
Thank you so much! This is amazing. I reviewed the workflow and everything makes complete sense. I have one other question. Is it possible to create another column used to store this data?
<td colspan="7" align="left">Search: Civil War Ken Burns</td>
This should be possible. It seems this “search” value is identical for each occurrence, so we can just fetch it once on the HTML Extract node.
Thank you so much. It is all very helpful and I am learning a lot from your examples. The one thing I noticed about the results for the following code is that, it will pull the first element in the search array of the ‘Extract rows’ node and use that value for all rows in the final table, disregarding that each table row from the email will have a unique value
Hi @PBreezy, the example did have identical text values in the Search lines, so based on the data you have provided my understanding was that this value doesn’t change between different elements.
If it does, you could no longer throw away the “colspan” rows and keep only the item rows. You would also need to change how to iterate through your rows, making the code more annoying to read