How I analyse GoogleBot visit with bash script and n8n

I know GoogleBot crawls multiple times a day. But I wanted to monitor it and do some data analysis stuff.

So I wrote a bash script that triggers the n8n webhook URL whenever the GoogleBot visits.

Server-side setup

  • Please change the logFile path as per your need. (Also Make sure to change If you’re using other web servers like Apache or Open Light Speed)
  • You can use the n8n webhook production URL
  • Create a new file called GoogleBot.sh and paste below code
#!/bin/bash

# Webhook URL
webhookURL=PASTE n8n WEBHOOK URL HERE

# This is the default Nginx log path.
# You can replace the path according to your setup
logFile=/var/log/nginx/access.log

tail -fn0 $logFile | \
while read line ; do
       echo "$line" | grep "Googlebot"
       if [ $? = 0 ]
       then
     curl  --silent --output /dev/null \
           -X POST \
       "$webhookURL" \
           -d 'GoogleBot=Yes'
   fi
done

Save and Close it.

Then make it executable with this command chmod +x GoogleBot.sh

To keep it running. We will create systemd service.

# Create New Service File
touch /etc/systemd/system/gbotnotify.service

# Create a Service file for systemd service
nano /etc/systemd/system/gbotnotify.service
  • Now paste the below code.
  • If you’re using a different location, make sure to change WorkingDirectory & GoogleBot.sh file path.
[Unit]
Description= Fire a webhook when GoogleBot visit your website.
Requires=network.target
After=network.target

[Service]
Type=simple
WorkingDirectory=/root
ExecStart=/bin/bash /home/root/GoogleBot.sh
Restart=always
RestartSec=3

[Install]
WantedBy=multi-user.target

Save and Close it.

Now reload and start the service with this command.

systemctl daemon-reload
systemctl enable gbotnotify
systemctl start gbotnotify

n8n Workflow setup

Inside n8n paste the below code. It’ll create WebHook node & IF Node.

{
  "nodes": [
    {
      "parameters": {
        "httpMethod": "POST",
        "path": "91d21d0a-8025-4280-920e-4dd4e2cab923",
        "options": {}
      },
      "name": "Webhook",
      "type": "n8n-nodes-base.webhook",
      "typeVersion": 1,
      "position": [
        500,
        230
      ],
      "webhookId": "91d21d0a-8025-4280-920e-4dd4e2cab923"
    },
    {
      "parameters": {
        "conditions": {
          "string": [
            {
              "value1": "={{$node[\"Webhook\"].json[\"body\"][\"GoogleBot\"]}}",
              "value2": "Yes"
            }
          ]
        }
      },
      "name": "IF",
      "type": "n8n-nodes-base.if",
      "typeVersion": 1,
      "position": [
        700,
        230
      ]
    }
  ],
  "connections": {
    "Webhook": {
      "main": [
        [
          {
            "node": "IF",
            "type": "main",
            "index": 0
          }
        ]
      ]
    }
  }
}

Then from the true branch connect whatever nodes you like.

Here is a simple example.

  • This workflow will start when the webhook triggers the Sends message to Telegram and Append to Google Sheet.
  • You can extend this workflow with a Function Node and Append the data with Date and Time so a graph can be created to monitor crawl rate
  • Here data means “GoogleBot HIT” which you can modify as per your need. (I’m just showing you an example)
  • Once the data is inside n8n, you can do a lot.

Caution: If you’re using Google Sheet, It’ll increase the API Usage Limit. Refer - Usage Limits  |  Sheets API  |  Google Developers

Possibilities are endless. :smiley:

7 Likes

Really cool. Thanks a lot for sharing @mcnaveen !

1 Like

Very cool! Love all the uses for n8n that you come up with :slight_smile:

1 Like