6 hours on a literature review dropped to 3 minutes — here's the n8n workflow

Literature reviews have a mechanical phase that slows everything down. You need citations for a topic. So you open Google Scholar, search, check citation counts to find authoritative papers, copy references into your doc, format them in APA, realize you also need BibTeX for the submission system, reformat again. Repeat for every topic you’re covering.

For a paper with 8-10 distinct search queries, that’s easily 2-3 hours of mechanical work before writing anything.

Built a workflow that turns a spreadsheet into a citation engine. Type a query, set Status to Pending, wait 20 seconds — top 10 papers by citation count come back with APA and BibTeX ready to paste.

What it does

Runs every minute → reads all rows with Status = “Pending” → searches Semantic Scholar, PubMed, and ArXiv simultaneously → ranks results by citation count → writes top 10 papers + full citations back to the sheet → status flips to “Completed”

Queue as many queries as you want. All pending rows get processed each cycle.

How to use it

  1. Open your Google Sheet

  2. Column A: your search query — “transformer attention mechanism”, “CRISPR off-target effects”, whatever

  3. Column B: type Pending

  4. Wait about 20 seconds

  5. Results appear in the same row — top papers list, citation counts, APA citations, BibTeX

To re-run a query with fresh results: change Status back to Pending.

What comes back per query

  • Results Found — total papers across all three databases

  • Top Papers — numbered list of top 10 most-cited papers with year and citation count

  • Total Citations — combined count across top 10

  • Most Cited — the single highest-cited paper with its count

  • APA Citations — ready-to-paste formatted references for all 10

  • BibTeX Citations — ready-to-paste .bib entries for all 10

  • Search Date

Example output for “large language model hallucination”:


Top Papers:

1. Survey of Hallucination in Natural Language Generation (2023) - 1,847 citations

2. TruthfulQA: Measuring How Models Mimic Human Falsehoods (2022) - 1,203 citations

3. Language Models Know What They Know (2022) - 891 citations

...

APA:

Ji, S., Lee, N., et al. (2023). Survey of hallucination in natural language generation. ACM Computing Surveys. https://doi.org/10.1145/3571730

BibTeX:

@article{ji2023survey,

title={Survey of hallucination in natural language generation},

author={Ji, S., Lee, N., et al.},

year={2023},

journal={ACM Computing Surveys},

doi={10.1145/3571730}

}

Databases searched

  • Semantic Scholar — broad coverage across all disciplines

  • PubMed — medical and life sciences

  • ArXiv — physics, math, CS, AI preprints

Setup

You’ll need:

  • Google Sheets (free — this workflow is entirely spreadsheet-driven, no Gmail or Slack needed)

  • n8n instance (self-hosted — uses PDF Vector community node)

  • PDF Vector account (free tier: 100 credits/month)

About 10 minutes to configure — the simplest setup in this series.

Download

Workflow JSON:

Academic-paper-finder.json

Full workflow collection:

khanhduyvt0101/workflows


Setup Guide

Step 1: Get your PDF Vector API key

Sign up at pdfvector.com — free plan works for testing.

Step 2: Create your Sheet

Headers in Row 1:


Search Query | Status | Results Found | Top Papers | Total Citations | Most Cited | APA Citations | BibTeX Citations | Search Date

Step 3: Import and configure

Download JSON → n8n → Import from File.

Read Queries: Connect Sheets, paste Sheet ID — pre-filtered to Status = Pending

PDF Vector - Search Papers: Add credential, paste API key

Update Results: Same Sheets credential and Sheet ID — matches on Search Query column to update the correct row

Step 4: Test it

Add a query to column A, type Pending in column B, wait about 20 seconds. The row updates with results.


Notes on databases

  • Semantic Scholar: strong for STEM, CS, economics, biology

  • PubMed: highly reliable for medical and life sciences

  • ArXiv: excellent for preprints in physics, math, CS, AI — citation counts tend to be lower than final published versions

BibTeX keys are auto-generated as [firstauthor][year][firstword] — standardize for large bibliographies.

Cost

2 credits per academic search. A literature review with 10 queries uses about 20 credits — well within the free tier.

Customizing it

Filter by year: The workflow already supports yearFrom and yearTo in the Search Papers node — add columns to your Sheet and map them to filter only recent work

Search one database: Edit the providers array in the Search Papers node — use ["semantic-scholar"] for general research or ["pubmed"] for clinical topics

Export to Notion: After Update Results, add a Notion node to create a page per query with the citation list


PDF Vector n8n integration

Full workflow collection

Questions? Drop a comment.

1 Like

nice work on the deduplication – we had to build something similar once and the multi-database search is smart. one thing: how do you handle citations that show up across all three databases? just relying on doi matching?