Analyse Documents with Textract, not just 'Expenses'

bocaz · April 20, 2022, 8:27pm

Please can you add ‘Document’ scanning to the AWS Textract node. Thank you.

My use case:

Not all invoices (and other documents) can be read by the Textract Expenses reader as many invoices have multiple tables that need to be read, there are also different costs between the two services on AWS.

I don’t think there’s much difference in the API as mentioned in:

github.com

n8n-io/n8n/blob/1c2ca6244cddd729097a053d8ca9df4ad5a08608/packages/nodes-base/nodes/Aws/Textract/AwsTextract.node.ts#L119

      
        
            		},
            	};
            
            
	async execute(this: IExecuteFunctions): Promise<INodeExecutionData[][]> {
            		const items = this.getInputData();
            		const returnData: IDataObject[] = [];
            		let responseData;
            		const operation = this.getNodeParameter('operation', 0) as string;
            		for (let i = 0; i < items.length; i++) {
            			try {
            				//https://docs.aws.amazon.com/textract/latest/dg/API_AnalyzeExpense.html
            				if (operation === 'analyzeExpense') {
            					const binaryProperty = this.getNodeParameter('binaryPropertyName', i) as string;
            					const simple = this.getNodeParameter('simple', i) as boolean;
            
            
					if (items[i].binary === undefined) {
            						throw new NodeOperationError(this.getNode(), 'No binary data exists on item!');
            					}
            
            
					if ((items[i].binary as IBinaryKeyData)[binaryProperty] === undefined) {
            						throw new NodeOperationError(this.getNode(), `No binary data property "${binaryProperty}" does not exists on item!`);

the other relevant API is: AnalyzeDocument - Amazon Textract

I believe this might be the only change needed:

{
   "Document": { 
      "Bytes": blob,
      "S3Object": { 
         "Bucket": "string",
         "Name": "string",
         "Version": "string"
      }
   },
   "FeatureTypes": [ "TABLES" ] //new json
}

Many thanks

sebasortiz.dev · September 30, 2022, 11:12pm

Hi @bocaz, I was able to run this in a very hacky way. Im looking to make the pull request, but if you can run n8n locally you could try.