Build Your Own DeepResearch with N8N + Apify + Notion!

Can’t use OpenAI’s Deep Research because it costs $200/month? No fret, I got you covered with this bootleg version you can use for pennies!

Hey there :wave: I’m Jim and if you’ve enjoyed this article then please consider giving me a “like” and follow on Linkedin or X/Twitter. For more similar topics, check out my other AI posts in the forum.


About a week ago, OpenAI announced Deep Research, an agentic solution designed to excel at complex research tasks and perform better and faster than humans. All the more exciting is it’s powered by OpenAI’s newest model series, o3, which boasts incredible benchmark scores beating even GPT4. The only problem is that at time of writing, you’re forced to be on OpenAI’s $200/month pro subscription to use it.

Bummer! Well, not if the open-source community has anything to say about it! Just days soon after the annoucement, HuggingFace and others were already publishing their own version of DeepResearch for free! I spent a day or two reading the various code repositories to learn the ins-and-outs and realised it was very plausible to build this all in n8n!

So I got to work building my own Deep Research in n8n based off David Zhang’s implementation - much easier than starting from scratch! - and after spending just a few hours, I was able to complete a working draft! I spent another day improving it to include features such as a form interface and the ability to upload the final reports to Notion.

:bulb:Check out the sample research reports here - Jim’s DeepResearcher Reports


The Stack

OpenAI o3-mini for LLM (OpenAI.com)

I’ve been consistently impressed with reasoning models so far and this kind of workflow is probably one of the best use-cases for them. O3-mini is working really well and the price makes it an easy swap if you’re currently using o1-mini for similar templates. Note at time of writing, you need to be on usage tier 3 or above - simplest way is to top up about $100 into your OpenAI account, otherwise, feel free to swap this out for Google 2.0 Thinking or DeepSeek R1.

Apify for SERP & Web Scraping (Apify.com)

Unlike the reference implementation, I decided to go with my currently preferred web scraping service, Apify.com, instead. It’s just a lot cheaper than Firecrawl.ai or SerpAPI.com and there’s no arbitrary monthly usage limits which would hinder the usefulness of this workflow.
NEW: I’ve updated the template to use Apify’s new RAG Web Browser which gives the workflow a nice boost - initially testing sees each webscraping process reduced by 3-5mins. thanks @jancurn !

Have your own favourite? Share in the comments below!

Notion for Reports (Notion.io)

I wasn’t convinced that a chat interface was the best for reports so opted to use a wiki-like tool such as Notion. This turned out great and better suited for long form content plus it allowed for past reports to be shareable, editable and searchable. The only pain was uploading the generated markdown report and I settled on a solution to convert the markdown to HTML and then HTML to Notion Blocks using an Gemini…

Gemini 2.0 Flash for Markdown to Notion Conversion (Gemini)

Sometimes you just need to use the right model for the task and in this instance, Gemini 2.0 just works better for conversion tasks in my experience. Note, if you’re not using notion to store reports, feel free to swap this out.

N8N for everything else! (n8n.io)

I’m loving the new updates to n8n’s forms and now that you can do all sorts with custom HTML fields, they can make for really cool frontends! For the template, I built a simple series of custom forms to initiate the deep research (see screenshot).
As for the rest, it was actually quite easy to map real code with various n8n nodes making the process quite easy to understand. Of course, I’m also taking full advantage of sub-workflows as the help with managing performance.


How it works

Deep Search via Recursive Web Search+Scraping

This Deep Research approach works very much like a human researcher - starting from a single search query, it can generate subqueries based on new information it receives to search again. This cycle can continue until enough data is collected to generate the full report. The more cycles performed, the better the information collected and thus the better the report produced - however, it’s takes much longer to complete. Humans with our limited context spans, will tire very quickly but automating with AI can go much further - this is Deep Research!

In code, this idea can be represented as a recursive loop and the challenge was implementing it in n8n which is built to be mostly linear. I actually figured out the technique a while ago when I was working on a template for AI content generation and here’s a quick explaination I wrote earlier of the same technique.

Once implemented, it’s just then a case of making the “depth” and “breadth” variables adjustable to allow the user to balance the number of cycles, cost and time they have available for their research.

Report generation using Reasoning Models

If you’re still reading up on Reasoning models, check out this X/Twitter thread from the OpenAI team about their o-series models. To paraphrase, reasoning models use a multi-step planning approach which excel at figuring out unstructured data and generating longer, more detailed outputs. Perfect for research!

In the template, I’ve stuck with a minimal prompts in the AI nodes but believe they are far from optimised for every research use-case. If you do decide to use this workflow, definitely learn more about reasoning models and update the prompts accordingly!


How To Use

  1. Follow the setup instructions contained within the template: Ensure you have access to o3-mini, Apify and Notion database is setup.
  2. This template is designed to be initiated via its form so either manually trigger or activate the template to make it publically available.
  3. From the form, you can enter your research topic and configure 2 parameters - depth and breadth - which determine how “deep” you want the research to go.
  4. This is not a quick workflow! With the default depth=1/breadth=2 settings, executions are expected to take at least 10 minutes - this will get you about 20 sources (ie. scraped webpages) . If you want more sources increase either the depth or breadth or both, but be extra sure because it could take a very long time to complete.
  5. Sit back and relax… Seriously, go make a cup of tea and come back later! The DeepResearch is designed to work idependently without human supervision.
  6. Finally, check the Notion page created for your research topic. The status will changed to done when the final report is uploaded.

If you’re having trouble, let me know in the comments.


The Template

I had a lot of fun building this template and will be releasing it for free in my n8n creator page. Please have a go and leave a comment with your thoughts and feedback. Did I misunderstand anything about Deep Research? How would you improve this template?

:warning: Requires n8n @ 1.77.0+

old version: n8n_deepresearch_community_share.json - Google Drive


Conclusion

It’ll always be interesting to me how we’ll keep discovering new ways of improving our use of LLMs using the most fundamental of methods. Credit to OpenAI for sparking this community effort and to David Zhang (@dzhng) for open-sourcing his reference implementation - I hope I did it justice!

if you’ve enjoyed this article then please consider giving me a “like” and follow on Linkedin or X/Twitter. For more similar topics, check out my other AI posts in the forum.

Still not signed up to n8n cloud? Support me by using my n8n affliate link. Disclaimer, this workflow does not work in n8n cloud just yet!

Need more AI templates? Check out my Creator Hub for more free n8n x AI templates - you can import these directly into your instance!

29 Likes

That’s an insane amount of work out there, nice job!
What do you think about ScrapeNinja web scraping API, is it possible to adapt it here?
It has some neat locally-executed (so, zero cost) operations like smart body content extractor via Readability Mozilla package.

3 Likes

@Anthony
Thanks for the tip! Yes, ScrapeNinja would definitely work as a replacement for Apify. I think if we’re able to do the search + get contents in one operation, this would speed up the executions considerably!

3 Likes

Amazing work @Jim_Le!

Maybe not suitable for this use case, but flagging for the future: https://r.jina.ai/YOUR_URL returns markdown and can be used for free at 20 RPM with no API Key, and at higher rates with paid keys: Contact sales

They have their own grounding and DeepSearch services as well.

2 Likes

Jim, great job. Quick question, how do I configure: “DeepResearch Subworkflow” to execute?
I see “Inititate Deep Research” under Trigger DeepResearch Asynchronously section, but the subworkflow does not trigger.

Hey @Wojtek_T
Welcome to the community!

Subworkflows run as separate executions so they won’t appear to “run” in the current canvas. Essentially subworkflows are like regular workflows but run in the background which is what allows you to go do other stuff whilst you wait. (Docs here if you’re interested)

You need to switch to the executions panel to see these subworkflow executions running.

2 Likes

Hey, this is cool. FYI you can run Google Search and scrape the content right away using Apify’s RAG Web Browser

4 Likes

@Jim_Le
Thanks, I have it all working just fine.
Still some work to be done, like list of citations and such, but I love the initial approach where you ask to clarify some details. Very clever idea.

1 Like

Those custom forms just look amazing! However the first one always uses the default values for those custom elements parsed in the “Set Variables” Node. The options do not exist in the $json object.
Is there a trick to this? Tested with Versions 1.78.1 and 1.80.0

Hey everyone! Thanks for the response and support for this workflow - I really appreciate it :grinning:

Just a quick update that the template has finally made it on my creator page and I’ll be making updates to it there:

Quick run down of updates for this version:

:rocket: Faster Web Search + Scraping
This update now uses Apify’s super convenient RAG Web Browser which basically does the SERP search + scraping contents of the results in one request! Thanks @jancurn for letting me know.

After initial testing, reducing each web scraping loop by 3-5mins. I wasn’t able to use standby mode as it would’ve been a bit expensive - meant keeping the scraper instance running for like 30-60mins for deeper searches.

:rocket: Bugfixes
@octionic Thanks for letting me know. I also discovered this when upgrading to 1.80.2 - this version “fixes” some of the hacks I used for the form where custom html fields could have field names. I’ve now updated this to use hidden fields instead.

Another fix was HTML->Notion Blocks convertor now handles list items better and uploading to Notion attempts retries when there are conflicts.


Finally, here’s an example of how to swap out the web search for Youtube search. You can also do the same with Reddit or Internal docs etc. I’m not planning to publish variations of the same template with different searches but reach out if you’re having trouble doing so.

2 Likes

Hello. I’m new to N8n, so apologies if this is the wrong place to ask this. Also, I don’t think this issue is down to your template, but I am seeing an “Error fetching options from Notion” message in all of the Notion nodes when I make use of the template. I cannot see a way of fixing this; your help would be appreciated.

@XGwynn Welcome to the community!

Sounds like a permissions issue.

  1. Make sure your API key is valid
  2. Make sure your API Key is connected to the page/table

Thanks, that worked! I’m now struggling to get Apify working. I will try myself before using any more of your time…

Got it working! Thanks again for your help!

This work is fantastic! @Jim_Le, I’m using it for many of my use cases. I wanted to ask if we could leverage this for deeper research, such as analyzing pages using ‘search’, like the image below.

Some websites feature pagination. How can we effectively extract data from these sites?

1 Like

(English isnt my first language, so sorry advanced)
Hello Jin, First of all, i would like to says thanks with your efforts indetailed and useful material for community. Ive started 1 month ago using n8n and seems like that whole work needed to be done opening a mvp company (like i did) normally would took like 3 months or more could de done in 10 days using alot from ur material.

About this one in special is where im more focused. I make its happen until first part (where the workflow put on notion the request id’s row on page.

But i couldnt make thru subworkflow process to step 5. Can you explain me how i can make the subworkflow work? Ive activate workflow but even in this way worked.

Tks again for your effort and advancing for help!

Edit: Seems like u have a update workflow 4 days ago (or just your reply’s date and ive assumed it) and im using one from 10 days ago. Would be a problem? Also my instance is updated to current stable version (cloud)

Thanks @Mohammed_Rifad
Yes of course! It just requires work to replace the current search strategy with one which works for this website - replace steps 8 + 9 in the template. Not an easy task but doable! If you need help modifying this, I’d suggest posting in the “Help me Build my Workflow” section of the forum.

Hey @Robert_Barral Welcome to the community!

Subworkflows run in separate executions and so won’t show on the canvas. Click on the “executions” tab to see the subworkflow running.

Got it!

It’s been over 70 minutes, crazy!

I’m also struggling with Apify. I seem to have some authentication problems. How did you set it up?