Need help to extract a Table from PDF to Google Sheet

Who wants to be my HERO ??

  • Mostly I would like explanation to learn, so if someone could guide me it would be wonderful.

I retrieve a PDF from my bank every month, and I would like to take the text from the PDF and extract the table from it.

  1. I’m not sure Text Manipulation is the best to do that

Todo

  • remove the garbage before the first \nValue date:
    • and after the last transaction
  • dividing transaction per line (again, I focus on \nValue date: )
  • pushing line by line in Google Sheet

data sample {{ $json.text }} received by Text Manipulation

Holder: Last Name Middle Name Name\nAccount: ES00 0000 0000 0000 0000 0000\nAvailable balance: 0.000,00€ (on 02 Oct. 2023)\nDocument on: 02 Oct. 2023 Page 1 of 5\nYour account activity (05 Jul. 2023–02 Oct. 2023)\nTransaction date Transaction Amount Balance\n02 Oct. 2023\nValue date: 02 Oct. 2023\nRecibo Clientes, S.A.U Nº Recibo 0000 0000 000\nBbjmqbh Ref. Mandato 000000000000, De\n-00,00€ 0.000,00€\n02 Oct. 2023\nValue date: 02 Oct. 2023\nTransferencia A Favor De Name LastName Concepto: Name\nLastName / Comercial xxxx Con xxxxxxx\n-00,00€ 0.000,00€\n02 Oct. 2023\nValue date: 02 Oct. 2023\nTraspaso: Personal Expenses -00,00€ 0.000,00€\n02 Oct. 2023\nValue date: 02 Oct. 2023\nTraspaso: Personal Expense -00,00€ 0.000,00€\n02 Oct. 2023

output I would like

  • “26 Sep. 2023
    Value date: 26 Sep. 2023”,“Transferencia De La Base Culture, Maison Des Francophonies, Concepto Remboursement Flam.”,000.00€,“0,000.00€”
  • where "Date","Transaction","Amount","Balance"

because I wish pushing line by line in a Google Sheet (but maybe I’m wrong in my logic)

My WiP Workflow

hi @JOduMonT
I do not have google account, but suppose that should work (if the sample is correct)

example output

1 Like

Pretty close, but not perfect :wink:

Obviously, I forgot to mention that the PDF is multipage, in that case 5
each page has a header and a footer which should be thrown in the garbage.

Like in the header, every line who starts with

  • Holder:
  • Account:
  • Available balance:
  • Document on:
  • Your account activity (
  • Transaction date Transaction Amount Balance

I say start with because I do something like this in my VS-Codium and they are new line
image
I replace \n by a return carriage

Also for an unknown reason
when the transaction started with Bizum or Traspaso: they are on one line while order are multilines

here a cut and paste directly from the PDF

25 Sep. 2023
Value date: 24 Sep. 2023

Bizum De xxx xxxxx xxxxxxx Concepto Cumpleaños 00,00€ 00.000,08€

25 Sep. 2023
Value date: 24 Sep. 2023

Transaccion Contactless En Fun Jump Sl, Massanassa, Tarj.
:*000000

-000,00€ 00.015,08€

25 Sep. 2023
Value date: 25 Sep. 2023

Traspaso: Family: Lunch At Artysana -00,40€ 00.003,08€

25 Sep. 2023
Value date: 25 Sep. 2023

Traspaso: Grocery: Consum -00,51€ 00.002,48€

In any case; I see I have to learn javascript, do you have any good resources about manipulating data with javascript ?

@JOduMonT

try this one (it is more complex as it uses regex, but maybe it will act better)

In general, that request is not about javascript itself, it is rather about data manipulation and working with strings.

Some useful resources:

1 Like

I greatly appreciate your help and those link.
I’ll try it soon and come back to you :slight_smile: