Regex matches across two data sets

Hi all, I’m a new user, please be gentle :slight_smile:
I’m trying to take two different datasets, one of CVE numbers, the other a list of installed software.

Describe the problem/error/question

How do I match strings from one field to those in another.
Basically I am trying to find any string found in the 1st dataset, in the titles of the 2nd dataset.

1st data set:
IBM
Linux
Rockwell
Plums

2nd Dataset:
IBM Concert
Linux Banana
Rockwell Sausages
Clams

What is the error message (if any)?

I’m just getting incorrect results.

Please share your workflow

Share the output returned by the last node

Information on your n8n setup

  • n8n version: 1.58.2
  • Database (default: SQLite): n/a
  • n8n EXECUTIONS_PROCESS setting (default: own, main):
  • Running n8n via (Docker, npm, n8n cloud, desktop app): Desktop
  • Operating system: Debian 12

It looks like your topic is missing some important information. Could you provide the following if applicable.

  • n8n version:
  • Database (default: SQLite):
  • n8n EXECUTIONS_PROCESS setting (default: own, main):
  • Running n8n via (Docker, npm, n8n cloud, desktop app):
  • Operating system:

Welcome to the community @Gibbon !

Tip for sharing information

Pasting your n8n workflow


Ensure to copy your n8n workflow and paste it in the code block, that is in between the pairs of triple backticks, which also could be achieved by clicking </> (preformatted text) in the editor and pasting in your workflow.

```
<your workflow>
```

That implies to any JSON output you would like to share with us.


Here’s the demo workflow you could adopt.

Will it work for you?

Here’s the logic. You make all the possible combinations of the 2 datasets. Once combined, you can filter out those combination that have overlapping strings. The output produces the items (strings) from the 1st dataset that were found in the items of the 2nd dataset.

Hi, thanks for the reply, its not working for me at the moment, just discards everything basically.

Hey @Gibbon, if you run my workflow you will see that it works. Perhaps you misconfigured yours or your datasets are not of the structure that is dealt with in my demo.

If you share your workflow with the sample of the actual data we could look into it further.

Hi, Thanks, this is a sample of actual data.

Dataset 1:
LDAP
Azure
IBM Concert
Defender
Linux

Dataset 2:
WordPress WPFactory Helper Plugin Reflected Cross-Site Scripting Vulnerability
Vernemq Memory Allocation Denial of Service Vulnerability
LDAP Allocation Denial of Service Vulnerability

So, its not just matching whole lines, it will be keywords in dataset 1 that are found in dataset 2. I need to match on those keywords.

Thanks! :slight_smile:

Hey @Gibbon , I didn’t have to change anything in the workflow, just replaced the datasets.

The only keyword that from Dataset 1 that is found in Dataset 2 is “LDAP”. That is exactly what the workflow returns. What am I missing here? Did you actually run the workflow I offered?

How exactly the dataset is represented? What you show is not how data is represented in n8n. Perhaps that is where you get confused?

To be more specific, in n8n the Dataset 1 is

[
  {
    "dataset 1": "LDAP"
  },
  {
    "dataset 1": "Azure"
  },
  {
    "dataset 1": "IBM Concert"
  },
  {
    "dataset 1": "Defender"
  },
  {
    "dataset 1": "Linux"
  }
]

Similarly, the Dataset 2 is

[
  {
    "dataset 2": "WordPress WPFactory Helper Plugin Reflected Cross-Site Scripting Vulnerability"
  },
  {
    "dataset 2": "Vernemq Memory Allocation Denial of Service Vulnerability"
  },
  {
    "dataset 2": "LDAP Allocation Denial of Service Vulnerability"
  }
]

Maybe you have your dataset as an array? That is, it is in the form

[
  "LDAP",
  "Azure",
  "IBM Concert",
  "Defender",
  "Linux"
]

That easily could be converted to the first way and continue with the rest of the workflow. Again, everything is working as I would expect unless I’m still misunderstanding your requirements.

Here’s the workflow again with the new datasets, I also renamed the properties to make it clearer to you.

The outcome

If you data is in the form of an array, here the workflow for it, which returns the very same result.

I think the problem is with my data, its not coming out of the text editing and dedupe steps as I want, it still has the fieldname in there:

And I dont know how to get ride of everything except the string I need.

I see. You just need to rename the property.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.