I need to create a workflow that processes about 250k contact entries. (I’d prefer to only proecess a changeset, but that doesn’t seem to be possible currently). The workflow needs run the following tasks:
- loop through a webservice that returns a max of 1000 records (id + email)
- filter out unnecessary records (missing or wrong email)
- loop through the rest of the1000 records and make a detail request
- filter out other records not meeting quality criterias
- lookup existing records by foreign key in 1CRM and do an update/insert (these steps could be extended, as we can extend the API)
An extra bonus is the rate limit of 1000 requests/minute for the first API (at least it will return a header with remaining capacities).
So at a small scale, I have this up and running, currently the rate limit and triggerless design are the main issues.
My idea was, to split the whole process into several small workflows:
- a counter with a delay that calls wf #2 with an integer 1-250 (or more, to lower the number in #2 a bit)
- the 1000 records request, that uses #1 for pagination, handles the first filter and forwards smaller chunks of data to #3
- as the detail lookup is able to process multiple records at once, I would use this to max out the rate limit, filter results and forward them to #4
- (not sure, if I’ll put this directly into #3 as it will make data handling easier) lookup 1CRM record and do the upsert
this way I expect error tracking being easier and the smaller workflows with external triggers should be easier to debug. As an alternative to #1 I could use a wf with a cron trigger, that stores the counter in global wf-context.
Is this a feasible way to solve this? Are there limitations that need to be handled? Any Tips?