Accessing Other AWS Endpoints with Custom HTTP Request

craig.mcelroy.nuso · September 15, 2023, 12:14am

Describe the problem/error/question

Does anyone have an example of how to format a custom HTTP request to work with a given custom AWS endpoint? Specifically need to use Textract.DetectDocumentText which is not available in the current AWS Textract node. I have tried a handful of permutations, but always get either a 400 or 404 response. Current attempt is below.

Included example shows using the built in AWS Textract node to indicate that the provided credentials are valid for both S3 access and Textract functions in the us-east-1 region. Unfortunately, that built in N8N node only includes Analyze Receipt or Invoice which does’t meet our needs. The tip in that node indicates that what I am trying to do should be possible, but the unfortunately the Learn more linked docs do not indicate exactly how to go about it.

Screenshot 2023-09-15 at 1.26.34 PM

What is the error message (if any)?

{"__type":"MissingAuthenticationTokenException","message":"Missing Authentication Token"}

Please share your workflow

Share the output returned by the last node

[
  {
    "error": {
      "message": "Bad request - please check your parameters",
      "timestamp": 1694802740484,
      "name": "NodeApiError",
      "description": "{\"__type\":\"MissingAuthenticationTokenException\",\"message\":\"Missing Authentication Token\"}",
      "context": {},
      "cause": {
        "message": "400 - \"{\\\"__type\\\":\\\"MissingAuthenticationTokenException\\\",\\\"message\\\":\\\"Missing Authentication Token\\\"}\"",
        "name": "Error",
        "stack": "Error: 400 - \"{\\\"__type\\\":\\\"MissingAuthenticationTokenException\\\",\\\"message\\\":\\\"Missing Authentication Token\\\"}\"\n    at createError (/usr/local/lib/node_modules/n8n/node_modules/axios/lib/core/createError.js:16:15)\n    at settle (/usr/local/lib/node_modules/n8n/node_modules/axios/lib/core/settle.js:17:12)\n    at RedirectableRequest.handleResponse (/usr/local/lib/node_modules/n8n/node_modules/axios/lib/adapters/http.js:238:9)\n    at RedirectableRequest.emit (node:events:525:35)\n    at RedirectableRequest.emit (node:domain:489:12)\n    at RedirectableRequest._processResponse (/usr/local/lib/node_modules/n8n/node_modules/follow-redirects/index.js:356:10)\n    at ClientRequest.RedirectableRequest._onNativeResponse (/usr/local/lib/node_modules/n8n/node_modules/follow-redirects/index.js:62:10)\n    at Object.onceWrapper (node:events:628:26)\n    at ClientRequest.emit (node:events:525:35)\n    at ClientRequest.emit (node:domain:489:12)\n    at HTTPParser.parserOnIncomingClient (node:_http_client:693:27)\n    at HTTPParser.parserOnHeadersComplete (node:_http_common:128:17)\n    at TLSSocket.socketOnData (node:_http_client:534:22)\n    at TLSSocket.emit (node:events:513:28)\n    at TLSSocket.emit (node:domain:489:12)\n    at addChunk (node:internal/streams/readable:315:12)\n    at readableAddChunk (node:internal/streams/readable:289:9)\n    at TLSSocket.Readable.push (node:internal/streams/readable:228:10)\n    at TLSWrap.onStreamRead (node:internal/stream_base_commons:190:23)"
      }
    }
  }
]

Information on your n8n setup

n8n version: 0.236.3
Database (default: SQLite): postgresql
n8n EXECUTIONS_PROCESS setting (default: own, main): main
Running n8n via (Docker, npm, n8n cloud, desktop app): docker
Operating system: debian

EmeraldHerald · September 15, 2023, 9:29am

Hi @craig.mcelroy.nuso Welcome to the community

This looks like an authentication error - can you double check that the region of AWS you’re specifying is the correct one, and that you’ve set up appropriate permissions for AWS Textract as mentioned here? Granting Programmatic Access - Amazon Textract For example to call AnalyzeDocumentText, you need permission to perform textract:AnalyzeDocumentText, if you aren’t using a something like AmazonTextractFullAccess for unrestricted access.

I’m not the most familiar with AWS, but I believe you can do something like:

curl -v -X $HTTP_METHOD https://$API_ID.execute-api.$AWS_REGION.amazonaws.com/$STAGE_NAME/$RESOURCE_NAME

to test out your credentials and ensure they’re working, too

craig.mcelroy.nuso · September 15, 2023, 2:13pm

I had already verified that the IAM user has the necessary permissions by first testing with aws cli. Sorry for not noting this when initially reporting. The following aws cli test which mirrors what I am trying to do in the N8N flow works successfully:

aws --profile n8n-textract textract detect-document-text --region us-east-1 --document '{ "S3Object": { "Bucket": "my-textract-scratch-bucket", "Name": "sample.pdf" } }'

I have also edited my original post and workflow to show that the built in AWS Textract node works with the same credentials accessing the same S3 object and the indication in the AWS Textract node that it should be possible to make custom API calls.

Jelle_de_Rijke · September 18, 2023, 12:53pm

I too struggle with this issue. For me the default AWS Textract Operation Analyze Receipt or Invoice returns ‘UnsupportedDocumentException’ despite document having correct format. This document in retrieved using an s3 node.

So I switch to ‘Custom API Call’. Apparently this is supposed to handle to authentication as described here:

I now add a http request node after the AWS Textract node. And select Authentication>Predefined Credential type> AWS credentials.

But now I get the ‘MissingAuthenticationTokenException’.

So I use webhook.site to inspect the call n8n sends out. And here I suspect 2 issues. Number 1 is that the binary data is not a Base64 string, but [object Object]. This is the content of the body in webhook.site:

{
  "Document": {
    "Bytes": "[object Object]"
  },
  "FeatureTypes": [
    "FORMS",
    "TABLES"
  ]
}

And issue number 2 is that I see no headers related to any authentication, whatsoever. So the MissingAuthenticationTokenException seems to be correct.

When I check the credentials in n8n>credentials I get the nice 'Connection tested successfully '.

Am I understanding the Textract node incorrectly or is it maybe bugging out?

Thanks for the support.

Bg

Jelle de Rijke

EDIT: just to be sure there is nothing wrong with my credentials I created a super simple setup in a python notebook with my creds (on local VSCode setup, this wont work in n8n since the library is not available in code node). This works like a charm, so no issues with creds.

For anyone interested:

import boto3

aws_access_key_id = 'mykeyid'
aws_secret_access_key = 'mykeysecret'
aws_region = 'my regio'

client = boto3.client(
    'textract',
    aws_access_key_id=aws_access_key_id,
    aws_secret_access_key=aws_secret_access_key,
    region_name=aws_region
)

from PIL import Image
import io

# Open an image file
with Image.open('somefile.png') as img:
    # Convert image to bytes
    buffer = io.BytesIO()
    img.save(buffer, 'PNG')
    image_bytes = buffer.getvalue()

response = client.analyze_document(
    Document={'Bytes': image_bytes},
    FeatureTypes=['FORMS']
)

This Analyses the document. For me getting the proper key value matches is done with this code:

# Step 1: Build a block map
block_map = {block['Id']: block for block in response['Blocks']}

# Step 2: Identify key-value pairs
key_map = {}
value_map = {}

for block in response['Blocks']:
    if block['BlockType'] == 'KEY_VALUE_SET':
        if 'KEY' in block['EntityTypes']:
            key_map[block['Id']] = block
        else:
            value_map[block['Id']] = block

# Step 3: Extract key-value pairs
for key_block_id, key_block in key_map.items():
    value_block = block_map.get(key_block['Relationships'][0]['Ids'][0])
    
    # Extracting keys
    key = ''
    if 'Relationships' in key_block:
        key_relationships = key_block['Relationships']
        for relationship in key_relationships:
            if 'Ids' in relationship:
                for word_id in relationship['Ids']:
                    word_block = block_map.get(word_id, {})
                    if 'Text' in word_block:
                        key += word_block['Text'] + ' '
    
    # Extracting values
    value = ''
    if 'Relationships' in value_block:
        value_relationships = value_block['Relationships']
        for relationship in value_relationships:
            if 'Ids' in relationship:
                for word_id in relationship['Ids']:
                    word_block = block_map.get(word_id, {})
                    if 'Text' in word_block:
                        value += word_block['Text'] + ' '
    
    print(f'Key: {key.strip()}, Value: {value.strip()}')

Now quite obviously I was hoping not to need to create api endpoints and host this python code for this pdf processing. I’d much prefer to have a simple n8n setup for this case.

craig.mcelroy.nuso · September 21, 2023, 3:48am

@Jelle_de_Rijke Looks like we are hitting the same issue with regards to AWS authentication working as advertised with a custom http request.

As for the issue you noted above, this is actually due to pdf not always being supported for the synchronous versions of the Textract calls. If you do some google-ing on that error, it seems that pdf is only supported for the synchronous calls for single page pdfs, whereas multi-page pdfs must be handled using the “start…” asynchronous versions of the calls, followed by the appropriate call to retrieve the results once the job completes. The AWS docs are pretty poor on that limitation for synchronous, but do indicate that asynchronous calls support multi-page.

tabacchi · October 2, 2023, 11:50am

I’m also facing the same challenges. Have either of you been successful in getting your workflows to run smoothly for asynchronous operations?

Jelle_de_Rijke · October 2, 2023, 1:08pm

I do appreciate your response, @craig.mcelroy.nuso.

However, I do not use a pdf, and definetely not a multi page pdf.
I included 2 example invoices here, one in jpg and one in png format:

These still give me the same error.

In anyone can try succesfully running Textract with these files, that would be very helpful.

Sadly it seems Textract in n8n is broken, at least for me!

system · December 31, 2023, 1:09pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.