How I analyse GoogleBot visit with bash script and n8n

I know GoogleBot crawls multiple times a day. But I wanted to monitor it and do some data analysis stuff.

So I wrote a bash script that triggers the n8n webhook URL whenever the GoogleBot visits.

Server-side setup

  • Please change the logFile path as per your need. (Also Make sure to change If you’re using other web servers like Apache or Open Light Speed)
  • You can use the n8n webhook production URL
  • Create a new file called GoogleBot.sh and paste below code
#!/bin/bash

# Webhook URL
webhookURL=PASTE n8n WEBHOOK URL HERE

# This is the default Nginx log path.
# You can replace the path according to your setup
logFile=/var/log/nginx/access.log

tail -fn0 $logFile | \
while read line ; do
       echo "$line" | grep "Googlebot"
       if [ $? = 0 ]
       then
     curl  --silent --output /dev/null \
           -X POST \
       "$webhookURL" \
           -d 'GoogleBot=Yes'
   fi
done

Save and Close it.

Then make it executable with this command chmod +x GoogleBot.sh

To keep it running. We will create systemd service.

# Create New Service File
touch /etc/systemd/system/gbotnotify.service

# Create a Service file for systemd service
nano /etc/systemd/system/gbotnotify.service
  • Now paste the below code.
  • If you’re using a different location, make sure to change WorkingDirectory & GoogleBot.sh file path.
[Unit]
Description= Fire a webhook when GoogleBot visit your website.
Requires=network.target
After=network.target

[Service]
Type=simple
WorkingDirectory=/root
ExecStart=/bin/bash /home/root/GoogleBot.sh
Restart=always
RestartSec=3

[Install]
WantedBy=multi-user.target

Save and Close it.

Now reload and start the service with this command.

systemctl daemon-reload
systemctl enable gbotnotify
systemctl start gbotnotify

n8n Workflow setup

Inside n8n paste the below code. It’ll create WebHook node & IF Node.

Then from the true branch connect whatever nodes you like.

Here is a simple example.

  • This workflow will start when the webhook triggers the Sends message to Telegram and Append to Google Sheet.
  • You can extend this workflow with a Function Node and Append the data with Date and Time so a graph can be created to monitor crawl rate
  • Here data means “GoogleBot HIT” which you can modify as per your need. (I’m just showing you an example)
  • Once the data is inside n8n, you can do a lot.

Caution: If you’re using Google Sheet, It’ll increase the API Usage Limit. Refer - Usage limits  |  Google Sheets  |  Google for Developers

Possibilities are endless. :smiley:

8 Likes

Really cool. Thanks a lot for sharing @mcnaveen !

1 Like

Very cool! Love all the uses for n8n that you come up with :slight_smile:

1 Like