n8n AI Workflow – PDF + Excel → Product Matching → CO₂ Data Extraction

Hi everyone,

I’m working on an n8n workflow using an AI Agent and I’d appreciate some guidance on how to best design this.

Goal

I want n8n to automatically identify HVAC (VVS) products, find their environmental data from Swegon, and compile CO₂ emissions data into an Excel file.

Input data

I have two parallel inputs:

  1. A PDF

    • Created by VVS consultants

    • Contains text describing which types of HVAC products are used

    • Product names may be partial, generic, or descriptive rather than exact article numbers

  2. An Excel file

    • Contains structured product-related data such as:

      • Product name

      • Product number / article number

      • Product category

      • Manufacturer (Swegon)

      • Other identifiers

What I want the workflow to do

  1. Read and analyze the PDF to extract referenced HVAC/VVS products.

  2. Read and analyze the Excel file in parallel.

  3. Cross-reference the PDF and Excel data to determine the most likely exact product:

    • Match by name similarity, product type, and technical characteristics

    • Handle cases where the match is uncertain

  4. Once the product is identified:

    • Go to Swegon’s official website

    • Find the correct product page

    • Store the product URL

  5. From the product page:

    • Download the relevant product PDF (EPD / environmental or sustainability documentation)
  6. Read the product PDF and extract:

    • CO₂ emissions data (e.g. kg CO₂e)

    • Unit and lifecycle stage (A1–A3, etc.), if available

  7. Compile the results into an Excel-compatible output with columns like:

    • Product Name

    • Product Number

    • Product Category

    • Product URL

    • CO₂ Emissions Value

    • CO₂ Unit

    • Lifecycle Stage

    • Source PDF

My main questions

  • What is the best way in n8n to:

    • Run PDF and Excel analysis in parallel?

    • Merge and match the data reliably?

  • Should product matching be handled entirely by an AI Agent, or partly with rules before AI?

  • Any recommended node patterns (Merge, Split in Batches, AI Agent, HTTP Request)?

  • How would you structure this to keep it robust and cost-efficient?

Any examples, tips, or architectural advice would be greatly appreciated.

Thanks in advance!

This is doable in n8n but I would focus on a strong rules first, AI second to produce a cost efficient, reliable workflow.
The high level flow would involve a simple trigger that would then split off and into two branches one for extracting and analyzing the PDF and the other for normalizing Excel data. You would then need a merge node to bring them back together.

I wouldn’t rely completely on an agent for matching start with detrerministic rules first (manufacturer = Swegon, category, article number if present, basic name similarity). Only if the match seems uncertain, send it to AI to get a more certain resolve. Store the matches in a small database like Airtable to avoid repetitive calls.

When crawling Swegon data Split in Batches then do an HTTP request fetch the products pages with the relevant data and extract those values from the PDF. Remember scope to cut down on AI usage.

This tends to be the best and most scalable solution for this type of problem.