Create/Read/Update/Delete Static Data Node Types

dkindlund · May 23, 2024, 2:40am

I read about the concept of “static data” within a workflow here:

I find it exceptionally odd that I think the only way to perform CRUD operations on “static data” is to use the Code node types in my workflow.

If that assumption is true, then you’re forcing users to use many different Code node types throughout a given workflow anytime they want to perform CRUD operations on static data.

The problem, though, is that we know that adding Code node types to any workflow is more likely to cause memory bloat or even out of memory conditions. Literally, the documentation here:

Says: AVOID using the CODE node where possible

So, if we’re not supposed to use the Code node, but we need to be able to manipulate static data – how can we do this in a memory efficient way??

I’d propose that n8n actually expose primitive node types that actually allow us to perform CRUD operations on static data without relying on the memory-heavy Code node.

Does this make sense?

barn4k · May 23, 2024, 1:57pm

You can use the Code node for static data operations, but you have to be careful, as if the staticData will be very big, you may have the issues with n8n and database that could be fixed only with the direct database operations (as the staticData won’t be easily deleted from the workflow). I’ve had once a situation, when my static data was so big, that my n8n instance started to be continuously down due to the OOM issues, and the only fix was to manually remove it in the database.

As it says in the staticData page:

You can save data directly in the workflow. This data should be small.

You will be fine if it’s small. Code node doesn’t make a big influence on the memory within the small portion of data.

dkindlund · May 23, 2024, 2:58pm

Hi @barn4k , I appreciate the sanity check. Thank you.

To be crystal clear, do OOM issues trigger if you have like let’s say 8 or 9 different Code nodes in a given workflow – but each instance of the Code node is just performing various CRUD operations on like 1 or 2 “small” staticData elements?

In case of future troubleshooting, could you also let me know which DB table stores staticData per workflow? (This would be handy in case I ever encounter similar OOM issues you’ve described.)

Thanks in advance!

barn4k · May 23, 2024, 3:50pm

I assume they don’t

The issues with static data and OOM don’t rely on the Code node, actually
It may occur when you save something big in the static data with any Code node and a code like

const stData = $getWorkflowStaticData('global')

stData.data = <10MB json object>

right after the node executes and performs this operation, that data will be stored in the workflow DB entry (you can see that data by using the n8n API method to get the workflow)

to receive the staticData in the DB:

select "staticData", "name" from workflow_entity
where id = 'PS1XKUs693WVEhkc'
limit 1

to delete the staticData from the DB:

update workflow_entity
SET "staticData" = '{}'::json
where id = 'PS1XKUs693WVEhkc'
RETURNING *;

You won’t receive any issues if, for example, you will keep the timestamps there:

// get the timestamp
const stData = $getWorkflowStaticData('global')

const ts = stData.ts
return {ts: ts}

// set the timestamp
const stData = $getWorkflowStaticData('global')

stData.ts = $now.toMillis()
return {ts: ts, date: $now}

dkindlund · May 27, 2024, 6:39pm

Hey @barn4k, thanks for the clarification.

I wanted to clarify a couple of other assumptions about static data:

Static data persists across workflow executions, correct? Meaning, one execution of the workflow might set some static data that a future execution of the same workflow might then access and read, correct?
If 2 or more executions of a given workflow are running in parallel, then it’s possible for all executions to be able to manipulate the same static data at the same time, correct?
If #1 and #2 are true, then there is no built-in mechanism to perform atomic CRUD operations on static data, correct? Like, I’m thinking about mutex/semaphore locking to avoid possible race conditions associated with 2 or more parallel executions happening within the same workflow.

If all of my previous assumptions are correct, then it seems like static data is really a dangerous mechanism to use if you ever expect to have multiple executions of the same workflow to be running at the same time – where each execution needs to perform CRUD operations on the same data stored in the DB.

If all you’re looking for a pseudo “thread-safe” way to use static data – where the static data within a single execution of a single workflow does not need to be shared across any other executions, then it seems like using a data structure like a hash table makes a lot of sense – something like:

// Get the global workflow static data
const workflowStaticData = $getWorkflowStaticData('global');

// Initialize execution-specific temp storage
workflowStaticData.tempStorage[execution_id] = ...

// Delete execution-specific temp storage
delete workflowStaticData.tempStorage[execution_id];

My point is that it feels like the original documentation should probably be updated to include these sorts of nuances, as it feels like playing with fire otherwise.

Does this make sense?

dkindlund · May 27, 2024, 6:49pm

After thinking about the “thread-safe” pseudo code example I previously listed more, I’m not sure that would actually work safely…

Ultimately, the static data is getting stored in a DB record…

And if two parallel executions attempt to update the same DB record at the same time, then it’s possible to get unintentional data loss with this strategy…

Therefore, I think my conclusion is that you really can’t safely use static data if your workflow has more than 1 concurrent executions running at the same time…

I wish there was a better built-in mechanism within n8n to:

Set temp variables used in one “branch” of a Switch node
… and then allow those temp variables to be used a different “branch” of a Switch node
… and once the execution completes, the temp variables are automatically deleted from memory (non-persistent)

Separately, I wish there were also an atomic set of primitives in n8n that could allow me to:

Perform CRUD operations on variables in a thread-safe way
… where 2 or more parallel executions need to update/use the same variables in a coordinated manner

barn4k · May 28, 2024, 8:46am

Correct

There is no built-in mechanism to avoid a race condition. So you have to build it on your own

Actually, if you need something to be stored only within one execution, you don’t need a static data at all, as you can reference any variable of the executed nodes.

Maybe, let’s ask @bartv

Well… It may if you will design it to be

E.g. you can define a lock mechanism, like in the example below:

In the first Code node I check for the existing lock, if it’s present and was set more than 1 hour ago or wasn’t set then the flow will proceed, set the new lock along with the date of the new lock, performs some actions and in the end release the lock for the next execution. The lock date is needed only in the situation when something in the workflow has been broken and thus the workflow state got frozen (the lock wasn’t released properly).

I’m using that mechanism in a couple of my workflows and it works pretty well.

dkindlund · May 28, 2024, 2:06pm

Hey @barn4k , I appreciate the example workflow showing how to implement like a pseudo-locking mechanism.

I think the only time where such a mechanism might fail, is if you’re dealing with high frequency workflows – workflows that execute in parallel all the time.

In that scenario, I could foresee a time when two parallel workflows execute at the same time, exactly 1 hour after the lock timestamp has expired, and then proceed to both attempt to set a lock using static data at the same time.

Sure, it’s rare, but a race condition like that is still possible.

To be honest, I’m not sure it’s possible to implement atomic locking within a Code node. I’m thinking we might need some lower level primitives to implement this well. Plus, it gets more complicated if n8n is running in a distributed manner – where 2 different executions are running in parallel across 2 different worker nodes, etc.

Ultimately, it seems like using some sort of external queuing (like AMQP) might be the safer choice – all be it more complicated choice.

barn4k · May 29, 2024, 2:39pm

Yes, the message broker service is your friend here