See repo: PDF to Markdown Wrapper (pdftomd.sh) is a RAG workflow-friendly enhancement of Marker that converts a PDF into a single markdown file. It handles GPU and PyTorch configuration, document splitting and chunking, image BASE64 embedding, LLM post-processing and cleanup, and consolidation of output
Splits large PDFs into chunks (100 pages by default, 10 pages when -l/–llm is enabled) and runs Marker once on the chunk folder (avoids repeated model loads).
Consolidates all chunk markdown into a single .md file.
Optionally embeds images as Base64 (no external asset folders needed).
Optional text-only output that strips image links from the final markdown.
Optional OCR pass via bundled ocr-pdf/ocr-pdf.sh before conversion.
Optional LLM helper via a built-in Marker --use_llm.
Automatically uses GPU when available and installs CUDA-enabled torch when needed.
Cleans up intermediate files and attempts to stop spawned processes on exit.
Optional supplemental LLM post-processing step with --clean.
The overall result can be a much cleaner more streamlined end product more suited to RAG pipeline ingestion.