Hi,
The data from Amazon Textract is not exactly straight forward, an example is here:
{
"DocumentMetadata": {
"Pages": 1
},
"JobStatus": "SUCCEEDED",
"ExpenseDocuments": [
{
"ExpenseIndex": 1,
"SummaryFields": [
{
"Type": {
"Text": "TOTAL",
"Confidence": 89.66414642333984
},
"LabelDetection": {
"Text": "TOTAL",
"Confidence": 89.2239990234375
},
"ValueDetection": {
"Text": "53900",
"Confidence": 87.2876205444336
},
"PageNumber": 1
},
{
"Type": {
"Text": "INVOICE_RECEIPT_DATE",
"Confidence": 95.58136749267578
},
"LabelDetection": {
"Text": "INVOICE AND\nSUPPLY DATE",
"Confidence": 95.3720474243164
},
"ValueDetection": {
"Text": "04/04/2022",
"Confidence": 94.0542984008789
},
"PageNumber": 1
}
]
}
]
}
What can I use to extract just the Text values from every âLabelDetectionâ and the âValueDetectionâ?
(and possibly return them as JSON data)
I understand there is a âSimplify Responseâ option but that is too simple, it doesnât return the actual TOTAL and some other important data that is required.
There are over 30 âSummaryFieldsâ, as you can see they are not split into unique data sets but multiple âgroupsâ within the Summary Fields block. I need to run through them all. I tried the Item Lists node, but that didnât help/ I wasnât able to use it effectively.
(There was âPolygonâ and âGeometryâ data in the JSON that I removed for brevity.)
Thank you.