6 hours on a literature review dropped to 3 minutes — here's the n8n workflow

Truong · April 19, 2026, 4:52am

Literature reviews have a mechanical phase that slows everything down. You need citations for a topic. So you open Google Scholar, search, check citation counts to find authoritative papers, copy references into your doc, format them in APA, realize you also need BibTeX for the submission system, reformat again. Repeat for every topic you’re covering.

For a paper with 8-10 distinct search queries, that’s easily 2-3 hours of mechanical work before writing anything.

Built a workflow that turns a spreadsheet into a citation engine. Type a query, set Status to Pending, wait 20 seconds — top 10 papers by citation count come back with APA and BibTeX ready to paste.

What it does

Runs every minute → reads all rows with Status = “Pending” → searches Semantic Scholar, PubMed, and ArXiv simultaneously → ranks results by citation count → writes top 10 papers + full citations back to the sheet → status flips to “Completed”

Queue as many queries as you want. All pending rows get processed each cycle.

How to use it

Open your Google Sheet
Column A: your search query — “transformer attention mechanism”, “CRISPR off-target effects”, whatever
Column B: type Pending
Wait about 20 seconds
Results appear in the same row — top papers list, citation counts, APA citations, BibTeX

To re-run a query with fresh results: change Status back to Pending.

What comes back per query

Results Found — total papers across all three databases
Top Papers — numbered list of top 10 most-cited papers with year and citation count
Total Citations — combined count across top 10
Most Cited — the single highest-cited paper with its count
APA Citations — ready-to-paste formatted references for all 10
BibTeX Citations — ready-to-paste .bib entries for all 10
Search Date

Example output for “large language model hallucination”:


Top Papers:

1. Survey of Hallucination in Natural Language Generation (2023) - 1,847 citations

2. TruthfulQA: Measuring How Models Mimic Human Falsehoods (2022) - 1,203 citations

3. Language Models Know What They Know (2022) - 891 citations

...

APA:

Ji, S., Lee, N., et al. (2023). Survey of hallucination in natural language generation. ACM Computing Surveys. https://doi.org/10.1145/3571730

BibTeX:

@article{ji2023survey,

title={Survey of hallucination in natural language generation},

author={Ji, S., Lee, N., et al.},

year={2023},

journal={ACM Computing Surveys},

doi={10.1145/3571730}

}

Databases searched

Semantic Scholar — broad coverage across all disciplines
PubMed — medical and life sciences
ArXiv — physics, math, CS, AI preprints

Setup

You’ll need:

Google Sheets (free — this workflow is entirely spreadsheet-driven, no Gmail or Slack needed)
n8n instance (self-hosted — uses PDF Vector community node)
PDF Vector account (free tier: 100 credits/month)

About 10 minutes to configure — the simplest setup in this series.

Download

Workflow JSON:

Academic-paper-finder.json

Full workflow collection:

khanhduyvt0101/workflows

Setup Guide

Step 1: Get your PDF Vector API key

Sign up at pdfvector.com — free plan works for testing.

Step 2: Create your Sheet

Headers in Row 1:


Search Query | Status | Results Found | Top Papers | Total Citations | Most Cited | APA Citations | BibTeX Citations | Search Date

Step 3: Import and configure

Download JSON → n8n → Import from File.

Read Queries: Connect Sheets, paste Sheet ID — pre-filtered to Status = Pending

PDF Vector - Search Papers: Add credential, paste API key

Update Results: Same Sheets credential and Sheet ID — matches on Search Query column to update the correct row

Step 4: Test it

Add a query to column A, type Pending in column B, wait about 20 seconds. The row updates with results.

Notes on databases

Semantic Scholar: strong for STEM, CS, economics, biology
PubMed: highly reliable for medical and life sciences
ArXiv: excellent for preprints in physics, math, CS, AI — citation counts tend to be lower than final published versions

BibTeX keys are auto-generated as [firstauthor][year][firstword] — standardize for large bibliographies.

Cost

2 credits per academic search. A literature review with 10 queries uses about 20 credits — well within the free tier.

Customizing it

Filter by year: The workflow already supports yearFrom and yearTo in the Search Papers node — add columns to your Sheet and map them to filter only recent work

Search one database: Edit the providers array in the Search Papers node — use ["semantic-scholar"] for general research or ["pubmed"] for clinical topics

Export to Notion: After Update Results, add a Notion node to create a page per query with the citation list

PDF Vector n8n integration

Full workflow collection

Questions? Drop a comment.

Benjamin_Behrens · April 19, 2026, 10:33am

nice work on the deduplication – we had to build something similar once and the multi-database search is smart. one thing: how do you handle citations that show up across all three databases? just relying on doi matching?