I know GoogleBot crawls multiple times a day. But I wanted to monitor it and do some data analysis stuff.
So I wrote a bash script that triggers the n8n webhook URL whenever the GoogleBot visits.
Server-side setup
- Please change the logFile path as per your need. (Also Make sure to change If you’re using other web servers like Apache or Open Light Speed)
- You can use the n8n webhook production URL
- Create a new file called GoogleBot.sh and paste below code
#!/bin/bash
# Webhook URL
webhookURL=PASTE n8n WEBHOOK URL HERE
# This is the default Nginx log path.
# You can replace the path according to your setup
logFile=/var/log/nginx/access.log
tail -fn0 $logFile | \
while read line ; do
echo "$line" | grep "Googlebot"
if [ $? = 0 ]
then
curl --silent --output /dev/null \
-X POST \
"$webhookURL" \
-d 'GoogleBot=Yes'
fi
done
Save and Close it.
Then make it executable with this command chmod +x GoogleBot.sh
To keep it running. We will create systemd service.
# Create New Service File
touch /etc/systemd/system/gbotnotify.service
# Create a Service file for systemd service
nano /etc/systemd/system/gbotnotify.service
- Now paste the below code.
- If you’re using a different location, make sure to change WorkingDirectory & GoogleBot.sh file path.
[Unit]
Description= Fire a webhook when GoogleBot visit your website.
Requires=network.target
After=network.target
[Service]
Type=simple
WorkingDirectory=/root
ExecStart=/bin/bash /home/root/GoogleBot.sh
Restart=always
RestartSec=3
[Install]
WantedBy=multi-user.target
Save and Close it.
Now reload and start the service with this command.
systemctl daemon-reload
systemctl enable gbotnotify
systemctl start gbotnotify
n8n Workflow setup
Inside n8n paste the below code. It’ll create WebHook node & IF Node.
Then from the true branch connect whatever nodes you like.
Here is a simple example.
- This workflow will start when the webhook triggers the Sends message to Telegram and Append to Google Sheet.
- You can extend this workflow with a Function Node and Append the data with Date and Time so a graph can be created to monitor crawl rate
- Here data means “GoogleBot HIT” which you can modify as per your need. (I’m just showing you an example)
- Once the data is inside n8n, you can do a lot.
Caution: If you’re using Google Sheet, It’ll increase the API Usage Limit. Refer - Usage limits | Google Sheets | Google for Developers
Possibilities are endless.