This is still under consideration, so Iād appreciate any advice.
Objectives
-
Full automation from error detection to notification and resolution
-
Error summarization and automated fixes with an AI Agent
-
Achieve automatic updates of Infrastructure as Code (IaC)
Flow
-
EC2 Error Occurs
- An error occurs on the EC2 instance
-
Error Detection & Trigger
- CloudWatch detects the error and triggers Lambda
-
Error Forwarding
- Lambda sends error details via HTTP POST
-
n8n Webhook
- n8n webhook receives the request and initiates processing
-
Error Analysis with AI
- LLM Chain generates a prompt ā Gemini invoked ā Error details summarized
-
Google Chat Notification
- Error summary automatically sent to Google Chat
-
Automated Error Fix
-
AI Agent generates a fix proposal
-
Infrastructure code automatically updated via GitHub MCP
-
Tech Stack
-
AWS CDK: Infrastructure as Code for AWS environment
-
n8n: Workflow automation after error detection
-
Gemini AI: Error summarization and prompt processing
-
Google Chat: Notification channel
-
GitHub MCP: Automatic updates of infrastructure code
What This Enables
-
End-to-end automation from incident ā notification ā resolution
-
Faster error analysis with AI
-
Continuous alignment with the latest state through IaC auto-updates
Considerations
-
Accuracy and risk management of automated fixes
-
Scope of application (all errors vs. specific patterns)
-
Rollback strategy design
